AI Agent: Saviours or Job Destroyers? The Controversy of the Next AI Revolution
Scientists are reshaping chatbots into online entities capable of engaging in gaming, web inquiries, meeting scheduling, chart creation, and various other tasks. An AI agent is a software application that emulates intelligent behavior. These agents have the capability to engage with their surroundings, gather information, and execute actions with the intent of accomplishing specific objectives.
AI agents span a spectrum of complexity, from basic rule-based systems to highly intricate machine learning models. What sets AI agents apart is their autonomy; they operate independently, not requiring direct human control, utilizing sensors and actuators to interact with their environment. Furthermore, AI agents have the capacity to learn and adapt from their interactions with the environment to achieve their defined goals.
The widely adopted chatbot, ChatGPT, was initially designed to create digital text, spanning everything from poetry to term papers to computer programs. However, a team of artificial intelligence researchers at Nvidia, a prominent computer chip company, discovered that the underlying technology of ChatGPT held far more potential.
In a matter of weeks, they harnessed this technology to teach ChatGPT to excel in Minecraft, one of the world’s most popular video games. Inside the virtual universe of Minecraft, ChatGPT quickly learned to swim, forage for plants, hunt pigs, mine gold, and even construct houses.
According to Nvidia’s Senior Research Scientist, Linxi Fan, known as Jim, “It can go into the Minecraft world and explore by itself and collect materials by itself and get better and better at all kinds of skills.” This innovative project offered an early glimpse into the transformation of chatbots into a new breed of autonomous systems referred to as AI agents. These agents are not confined to mere conversation; they can actively use software applications, websites, and various online tools, including spreadsheets, digital calendars, travel sites, and more.
In the future, many experts believe that these AI agents could become significantly more sophisticated, potentially leading to the automation of nearly any white-collar job. As Jeff Clune, a computer science professor at the University of British Columbia, formerly with OpenAI, the company behind ChatGPT, puts it, “This is a huge commercial opportunity, potentially trillions of dollars. This has a huge upside — and huge consequences — for society.”
Nvidia’s AI agent can play video games, and similar agents can schedule meetings, edit files, analyze data, and generate colorful bar charts. The vision is that these automated systems will evolve into personal assistants capable of handling a broad array of tasks across the internet.
While today’s AI agents have limitations and cannot entirely organize your life, they offer the promise of making office work and daily tasks more efficient. They may also bring about a fundamental change in the world of video games by introducing a new generation of AI-driven game companions that players can interact with.
The technology underlying ChatGPT, known as GPT-4, is classified as a large language model, a type of AI system that learns skills by analyzing extensive datasets. In recent months, GPT-4 has astounded millions of people by generating emails, crafting speeches, and engaging in conversations on diverse topics. However, one of its most remarkable skills is writing computer programs.
GPT-4 can instantly create code to draw a unicorn on your screen or simulate falling digital snow. Proficient software developers can request code snippets that can be integrated into larger applications, ranging from social media tools to search engines. But its capabilities go beyond this; it can generate code that interfaces with various software applications and websites.
To play Minecraft, Nvidia researchers harnessed GPT-4’s ability to generate code. The key word here, according to Dr. Fan, is “code,” which empowers the system to take actions. People interact with software apps and websites through graphical interfaces, like buttons and menus. In contrast, AI agents use Application Programming Interfaces (APIs), which are the underlying code that enables them to communicate with online services.
For instance, if you instruct an AI agent to upload a video to the internet, it can generate code that uses YouTube’s API for this purpose. In theory, a chatbot can generate code for virtually any internet API. However, current chatbots are not yet proficient enough to perform complex tasks, and even if they were, granting them unrestricted access to the internet would pose significant security risks. Therefore, companies are proceeding with caution.
Several months after the introduction of ChatGPT, OpenAI discreetly introduced the ability for the chatbot to do more than generate text. By installing various plug-ins, which are software augmentations, users could instruct ChatGPT to perform tasks such as searching travel websites for flight options, retrieving maps from Google Earth, or converting an expenditure spreadsheet into a multicolored bar chart.
With the addition of a plug-in called a “code interpreter,” ChatGPT was not only capable of writing code but also executing it. This advancement allowed the technology to perform tasks it couldn’t before, including spreadsheet editing and transforming static images into videos. Companies like Google and Microsoft are actively exploring similar technologies.
The ultimate goal is for AI agents to collaborate with other AI systems on the user’s behalf. This vision is being pursued by independent projects like AutoGPT, which seeks to imbue the system with goals like “create a company” or “make some money.” The system then autonomously explores ways to achieve these goals, asking questions and connecting with internet services to fulfill its objectives.
While current systems like AutoGPT are not without limitations and can sometimes get stuck in endless loops, researchers like Dr. Fan are committed to refining this technology to make it more practical and reliable. Other researchers are focusing on developing a new breed of AI agents designed to interact with software tools. One of the noteworthy developments in this area occurred in the summer of 2022 when a team of researchers, including Dr. Clune from OpenAI, built an agent that interacts with computer software similarly to how a person would, using mouse clicks and keyboard inputs.
The technology learned to play the game by analyzing how people used their mouse and keyboard to navigate Minecraft’s digital world. Other companies, including startups like Adept, are working on similar agents that can interact with websites like Wikipedia, Redfin, Craigslist, and widely-used office applications such as those from Salesforce. Dr. Clune believes that these agents will eventually empower artificial intelligence to use a wider array of software applications and websites. While this can make life more convenient, it also raises the possibility of replacing numerous jobs.
Dr. Clune suggests that if AI can perform any task humans can, it will not just automate boring tasks but potentially take over all tasks. This ongoing development in AI agents holds the promise of transforming the way we work, play, and interact with technology, with profound implications for various industries and the broader society.
Ultimately, the potential is vast, offering the prospect of enhancing productivity and convenience. However, it also raises questions about the impact on the job market and the nature of human-computer interactions. As we continue to witness the growth of AI agents and their integration into our daily lives, it is crucial to navigate this transformation responsibly and ethically, ensuring that these technologies complement human abilities and contribute positively to society’s progress.