How ‘A.I. Agents’ That Roam the Internet Could One Day Replace Workers

The widely used chatbot ChatGPT was designed to generate digital text, everything from poetry to term papers to computer programs. But when a team of artificial intelligence researchers at the computer chip company Nvidia got their hands on the chatbot’s underlying technology, they realized it could do a lot more.

Within weeks, they taught it to play Minecraft, one of the world’s most popular video games. Inside Minecraft’s digital universe, it learned to swim, gather plants, hunt pigs, mine gold and build houses.

“It can go into the Minecraft world and explore by itself and collect materials by itself and get better and better at all kinds of skills,” said a Nvidia senior research scientist, Linxi Fan, who is known as Jim.

The project was an early sign that the world’s leading artificial intelligence researchers are transforming chatbots into a new kind of autonomous system called an A.I. agent. These agents can do more than chat. They can use software apps, websites and other online tools, including spreadsheets, online calendars, travel sites and more.

In time, many researchers say, the A.I. agents could become far more sophisticated, and could replace office workers, automating almost any white-collar job.

“This is a huge commercial opportunity, potentially trillions of dollars,” said Jeff Clune, a computer science professor at the University of British Columbia who previously worked on this kind of technology as a researcher at OpenAI, the San Francisco start-up that built ChatGPT. “This has a huge upside — and huge consequences — for society.”

Nvidia’s agent plays a game. Similar agents can schedule meetings, edit files, analyze data and build multicolored bar charts. The idea is that these automated systems will eventually act as personal assistants able to handle a wide range of tasks across the internet.

From left, Anima Anandkumar, senior director of A.I. research at Nvidia, with Yuke Zhu and Jim Fan, both senior research scientists.Credit…Gabriela Hasbun for The New York Times

Today’s agents are limited, and they can’t exactly organize your life. ChatGPT can search the travel site Expedia for flights to New York, but you still have to book the reservation on your own.

This technology, as researchers improve it, could make office workers and consumers more efficient. It could also change the nature of video games, providing a new wave of bots that gamers can play alongside and chat with.

GPT-4, the technology that underpins ChatGPT, is what researchers call a large language model. It is an A.I. system that learns skills by analyzing huge amounts of data.

Over the past several months, the technology has wowed hundreds of millions of people with the way it generates emails, writes speeches and riffs on almost any topic. But its most important skill may be its knack for writing computer programs.

Jeff Clune, a former OpenAI researcher who is now a professor at the University of British Columbia, said A.I. agents could eventually perform white-collar jobs.Credit…Alana Paterson for The New York Times

It can instantly generate a program that draws a unicorn or drops digital snow across your laptop screen. Professional software developers can ask for code that they can fold into larger programs, including everything from social media apps to search engines. But that is only part of what this technology can do. It can also generate computer code that taps into other software apps and websites.

This is how Dr. Fan and other Nvidia researchers taught GPT-4 to play Minecraft. “The most important word here is code,” Dr. Fan said. “Code can take actions.”

People use software apps and websites by touching buttons, menus and other graphical widgets. A.I. agents use apps and websites by accessing their application programming interfaces, or A.P.I.s — the underlying software code that lets them communicate with other online services.

If you ask an agent to upload a video to the internet, for instance, it could generate code that called an A.P.I. offered by YouTube. “An A.P.I. is just text used to talk to a machine,” said Silen Naihin, a researcher who helps run an independent A.I. agent project, AutoGPT.

In theory, a chatbot can write code for access to any A.P.I. on the internet. But today’s chatbots are not yet adept enough to do more than just simple tasks. And even if they were, letting them freely roam the internet would be an enormous security risk. So companies are starting small.

A few months after OpenAI unveiled ChatGPT, it quietly released a way for the chatbot to do more than generate text. After installing various plug-ins — software that augments what the bot can do — you could ask it to search travels sites like Expedia for available flights, grab a map of your hometown from Google Earth or even transform a spreadsheet detailing your yearly spending into a multicolored bar chart.

Equipped with a plug-in called code interpreter, ChatGPT could not just write code but also run it. This allowed the technology to instantly perform tasks it could not in the past, including editing spreadsheets and transforming still images into videos. Google, Microsoft and other companies are exploring similar technologies.

“These are projects where we’re envisioning essentially A.I.s working with other A.I.s on your behalf,” Ashley Llorens, a vice president at Microsoft, said.

Independent projects such as AutoGPT are trying to take this kind of thing several steps further. The idea is to give the system goals like “create a company” or “make some money.” Then it will look for ways of reaching that goal by asking itself questions and connecting to other internet services.

Today, this does not work all that well. Systems like AutoGPT tend to get stuck in endless loops. But researchers like Dr. Fan are constantly refining this kind of technology in an effort to make it more useful and more reliable.

Other researchers are building a new kind of A.I. agent designed for using software tools. In summer 2022, Dr. Clune was among a team of OpenAI researchers who built an agent that could use computer software much as a person would — mouse click by mouse click, keystroke by keystroke.

The office of Dr. Clune.Credit…Alana Paterson for The New York Times
Objects and chess pieces in Dr. Clune’s office were created by A.I. and printed on a 3-D printer.Credit…Alana Paterson for The New York Times

Dr. Clune and his colleagues fed the system hours of online videos that showed people playing Minecraft. By analyzing the way people used their mouse and keyboard to navigate through Minecraft’s digital universe, the system learned to play the game on its own.

Other companies, including a start-up called Adept, are building similar agents that use websites like Wikipedia, Redfin and Craigslist and popular office apps from companies like Salesforce.

Dr. Clune argues that this kind of agent will eventually allow artificial intelligence to use a much broader range of software apps and websites. He said everyone would have access to a digital assistant that could potentially do almost anything on the internet. That could make life easier — but it could also replace countless jobs.

“If A.I. can do anything we can do, it does not just replace the boring tasks,” he said. “It replaces all the tasks.”

Back to top button