What is Google Gemini?
Gemini is a powerful artificial intelligence (AI) model from Google that can understand text, images, videos, and audio. As a multimodal model, Gemini is described as capable of completing complex tasks in math, physics, and other areas, and understanding and generating high-quality code in various programming languages.
It is currently available through theGemini chatbot (formerly Google Bard)and someGoogle Pixeldevices and will gradually be folded into other Google services. During Google I/O 2024, the company announced new features that will come to Gemini, including a new 'Live' modeand integrations with Project Astra. Gemini also powers AI overview in Google searches.
Also: I ranked the AI features announced at Google I/O from most useful to gimmicky
"Gemini is the result of large-scale collaborative efforts by teams across Google, including our colleagues at Google Research," said Dennis Hassabis, CEO and co-founder of Google DeepMind, when announcing Gemini.
"It was built from the ground up to be multimodal, which means it can generalize and seamlessly understand, operate across, and combine different types of information including text, code, audio, image, and video."
Who made Gemini?
Gemini was created by Google and Alphabet, Google's parent company, and released as the company's most advanced AI model to date.
Also:The ChatGPT desktop app is more helpful than I expected - here's why and how to try it
Google DeepMind also made significant contributions to the development of Gemini.
Are there different versions of Gemini?
Google describes Gemini as a flexible model capable of running on everything from Google's data centers to mobile devices. To achieve this level of scalability, Gemini was released in three sizes: Gemini Nano, Gemini Pro, and Gemini Ultra.
- Gemini Nano 1.0:The Gemini Nano model size is designed to run on smartphones, initially launched on the Google Pixel 8. It's built to perform on-device tasks that require efficient AI processing without connecting to external servers, such as suggesting replies within chat applications, understanding images, or summarizing text. The Gemini Nano model features a 32,000-token context window.
- Gemini Flash 1.5: This model is built for speed, so it's a lightweight and cost-efficient option. The model features a long context window, with a one-million token context by default, enough to process an hour of video or over 30,000 lines of code.
- Gemini Pro 1.5:Running on Google's data centers, Gemini Pro is designed to power the latest version of the company's paid AI chatbot service, Gemini Advanced. This model can deliver fast response times and understand complex queries. Google just upgraded its context window to two million tokens, the longest of any large-scale model available now.
- Gemini Ultra 1.0:Google describes Gemini Ultra as its most capable model, exceeding "current state-of-the-art results on 30 of the 32 widely-used academic benchmarks used in large language model (LLM) research and development." It's designed for highly complex tasks and is available through Vertex AI and Google AI Studiowith the Gemini API.
Also: This subtle (but useful) AI feature was my favorite Google I/O 2024 announcement
How can you access Gemini?
The fastest way to use the Gemini model is to go to the AI chatbot's website, Gemini.Google.com. You can have a conversation with Gemini through this site like you can with ChatGPT and other AI chatbots.
The Gemini model is available in Google products, like Android-powered devices, the Gemini mobile app, Google searches with an AI overview, Google Photos, and more. Google plans to integrate Gemini further into its Search, Ads, Chrome, and other services.
Also: Google Glass vs. Project Astra: Sergey Brin on AI wearables and his top use case
Developers and enterprise customers can access Gemini Ultra via the Gemini API in Google's AI Studio and Google Cloud Vertex AI. Android developers have access to Gemini Nano via AICore.
How does Gemini differ from other AI models, like GPT-4?
Google's new Gemini model appears to be the largest, most advanced AI model to date, though the widespread release of the Ultra model will determine that fact for certain. Compared to other popular models that power AI chatbots, Gemini stands out due to its native multimodal characteristic and long context window of one million tokens.
Also: What does GPT stand for? Understanding GPT 3.5, GPT 4, GPT-4 Turbo, and more
GPT-4, by comparison, is available in 8k and 32k token contexts.
Compared to GPT-4, a primarily text-based model, Gemini easily performs multimodal tasks natively. While GPT-4 excels in language-related tasks, such as content creation and complex text analysis natively, it resorted to OpenAI's plugins to perform image analysis and access the web at the time of testing and relies on DALL-E 3 and Whisper to generate images and process audio.
This approach could change when OpenAI makes GPT-4o widely available, as ChatGPT won't rely on three separate models to perform actions and will instead use an omnimodel.
Also: The best AI chatbots: ChatGPT and other noteworthy alternatives
Google's Gemini also appears to be more product-focused than other models available. Gemini is either integrated into the company's ecosystem or has plans to be, as it's powering both the chatbot and Android devices. Other models, like GPT-4 and Meta's Llama, are more service-oriented and available for various third-party developers for applications, tools, and services.
Artificial Intelligence
- How I used ChatGPT to scan 170k lines of code in seconds and save me hours of detective work
- 6 ways to write better ChatGPT prompts - and get the results you want faster
- 6 digital twin building blocks businesses need - and how AI fits in
- Google's Gems are a gentle introduction to AI prompt engineering