The 10 announcements that made 2024 a landmark year for AI

We’ve officially passed the second anniversary of the start of the AI boom, and things haven’t slowed down. Just the opposite. Generative AI is ramping up at a pace that feels nearly overwhelming, expanding into new platforms, mediums, and even devices at a relentless pace.

Here are the 10 announcements that made 2024 a monumental year in the world of AI.

OpenAI releases GPT-4o

When ChatGPT (running GPT-3.5) first arrived in November 2022, it was basically a fancy, computer-controlled game of Mad Libs. Don’t get me wrong, even that capability was revolutionary at the time, but it wasn’t until the release of GPT-4o in May 2024 that generative AI systems truly came into their own.

Building on its predecessor’s ability to analyze and generate both text and images, GPT-4o provides a more comprehensive contextual understanding compared to GPT-4 alone. This translates to better performance in everything from image captioning and visual analysis, to generating both creative and analytical content like graphs, charts, and images.

Advanced Voice Mode helps computers speak like humans

ChatGPT Advanced Voice Mode Desktop app — OpenAI

In September, OpenAI once again showed why it is the leading artificial intelligence firm by releasing its Advanced Voice Mode to ChatGPT subscribers. This feature eliminated the need for users to type their questions into a prompt window, instead enabling them to converse with the AI as they would another person.

Leveraging GPT-4o’s human-equivalent response times, Advanced Voice Mode fundamentally changed how people can interact with machine intelligence and helped users unleash the AI’s full creative capacity.

Generative AI comes to the edge

Using Visual Intelligence on an iPhone 16 Pro showing ChatGPT answer. — Visual Intelligence on iPhones relies on the camera to make sense of the world around. Christine Romero-Chan / Digital Trends

When ChatGPT debuted in 2022, it was the only AI in town and available in precisely one place: ChatGPT.com. Oh, what a difference two years makes. These days, you can find generative AI in everything from smartphones and smart home devices to autonomous vehicles and health-monitoring gadgets. ChatGPT, for example, is available as a desktop app, an API, a mobile app, and even via an 800 number. Microsoft, for its part, has integrated AI directly into its line of Copilot+ laptops.

Perhaps the most significant example, of course, is Apple Intelligence. It might not have been the most successful launch (many of the features we are still waiting for), but in terms of making the powers of generative AI as accessible as possible, nothing was as important as Apple Intelligence.

Now, neither Copilot+ PCs or Apple Intelligence panned out how the companies involved probably wanted — especially for Microsoft — but as we all know, this is only the beginning.

The resurgence of nuclear power production

three Mile island — Constellation Energy

Before this year, nuclear power was seen as a losing proposition in America. Deemed unreliable and unsafe, due in large part to the Three Mile Island incident of 1979 in which one of the plant’s primary reactors partially melted down and spewed toxic, radioactive material into the atmosphere. However, with the rapidly increasing amounts of electrical power that modern large language models require — and the massive stress they place on regional power grids — many leading AI firms are taking a closer look at running their data centers using the power of the atom.

Amazon, for example, purchased a nuclear-powered AI data center from Talen in March, then signed an agreement to acquire miniaturized, self-contained Small Modular Reactors (SMRs) from Energy Northwest in October. Microsoft, not to be outdone, has purchased the production capacity of Three Mile Island itself and is currently working to get Reactor One back online and generating electricity.

Agents are poised to be the next big thing in generative AI

glasses and chatgpt — Matheus Bertelli / Pexels

Turns out, there’s only so much training data, power, and water you can throw at the task of growing your large language model until you run into the issue of diminishing returns. The AI industry experienced this firsthand in 2024 and, in response, has begun to pivot away from the massive LLMs that originally defined the generative AI experience in favor of Agents; smaller, more responsive models designed to perform specific tasks, rather than try to do everything a user might ask of it.

Anthropic debuted its agent, dubbed Computer Use, in October. Microsoft followed suit with Copilot Actions in November, while OpenAI is reportedly set to release its agent feature in January.

The rise of reasoning models

OpenAI

Many of today’s large language models are geared more toward generating responses as quickly as possible, often at the expense of accuracy and correctness. OpenAI’s o1 reasoning model, which the company released as a preview in September and as a fully functional model in December, takes the opposite approach: It sacrifices response speed to internally verify its rationale for a given answer, ensuring that it is as accurate and complete as possible.

While this technology has yet to be fully embraced by the public (o1 is currently only available to Plus and Pro tier subscribers), leading AI companies are pressing ahead with versions of their own. Google announced its answer to o1, dubbed Gemini 2.0 Flash Thinking Experimental, on December 19, while OpenAI revealed that it is already working on o1’s successor, which it calls o3, during its 12 Days of OpenAI live-stream event on December 20.

AI-empowered search spreads across the internet

Perplexity AI app running on an iPhone 14 Pro. — Joe Maring / Digital Trends

Generative AI is seemingly everywhere these days, so why wouldn’t it be integrated into one of the internet’s most basic features? Google has been toying with the technology for the past two years, first releasing the Search Generative Experience in May of 2023 before rolling out its AI Overview feature this past May. AI Overview generates a summary of the information that a user requests at the top of its search results page.

Perplexity AI takes that technique a step further. Its “answer engine” scours the internet for the information a users requests, then synthesizes that data into a coherent, conversational (and cited) response, effectively eliminating the need to click through a list of links. OpenAI, ever the innovator, developed a nearly identical system for its chatbot, dubbed ChatGPT Search, which it debuted in October.

Anthropic’s Artifact kicks off a collaborative revolution

Anthropic

Trying to generate, analyze, and edit large files — whether they’re long-form creative essays or computer code snippets — directly within the chat stream can be overwhelming, requiring you to endlessly scroll back and forth to view the entirety of the document.

Anthropic’s Artifacts feature, which debuted in June, helps mitigate that issue by providing users with a separate preview window in which to view the AI-crafted text outside of the main conversation. The feature proved to be such a hit that OpenAI quickly followed suit with its own version.

Its latest models and features have developed Anthropic into a formidable opponent to OpenAI and Google this year, which alone feels significant.

Image and video generators finally figure out fingers

Use Camera Control to direct every shot with intention.

Learn how with today’s Runway Academy. pic.twitter.com/vCGMkkhKds

— Runway (@runwayml) November 2, 2024

Used to be that spotting an AI generated image or video was as simple as counting the number of appendages the subject shows — anything more than two arms, two legs, and 10 fingers were obviously generated, as Stable Diffusion 3’s Cronenberg-esque images demonstrated in June. Yet, as 2024 comes to a close, differentiating between human and machine-made content has become significantly more difficult as image and video generators have rapidly improved both the quality and physiological accuracy of their outputs.

AI video systems like Kling, Gen 3 Alpha, and Movie Gen are now capable of generating photorealistic clips with minimal distortion and fine-grain camera control, while the likes of Midjourney, Dall-E 3, and Imagen 3 can craft still images with a startling degree of realism (and minimal hallucinated artifacts) in myriad artistic styles.

Oh yeah, and OpenAI’s Sora finally made its debut as part of its December announcements. The battle for AI-generated video models is heating up, and they got shockingly impressive in 2024.

Elon Musk’s $10 billion effort to build the world’s biggest AI training cluster

Elon Musk at Tesla Cyber Rodeo. — Digital Trends

xAI launched Grok 2.0 this year, the latest model built right into X. But the bigger news around Elon Musk’s AI venture is around where this headed in the future. In 2024, Elon Musk set about constructing the “world’s largest supercomputer” just outside of Memphis, Tennessee, which came online at 4:20 a.m. on July 22. Driven by 100,000 Nvidia H100 GPUs, the supercluster is tasked with training new versions of xAI’s Grok generative AI model, which Musk claims will become “the world’s most powerful AI.”

Musk is expected to spend around $10 billion in capital and inference costs in 2024 alone but is reportedly working to double the number of GPUs powering the supercomputer in the new year.

Source link