Silicon Valley’s quest to automate everything is relentless, which explains its latest obsession: Auto-GPT.
In essence, Auto-GPT uses the versatility of the latest OpenAI AI models to interact with software and online services, allowing it to perform “autonomous” tasks like X and Y. But as we learn with large language models , this ability appears to be as wide as an ocean but as deep as a puddle.
Auto-GPT, which you may have seen blowing up on social media recently, is an open source application created by game developer Toran Bruce Richards. which uses the OpenAI text generation models, mainly GPT-3.5 and GPT-4, to act in an “autonomous” way.
There is no magic in that autonomy. Auto-GPT simply handles traces of an initial notice from OpenAI models, asking and answering them until a task is complete.
Auto-GPT is basically GPT-3.5 and GPT-4 combined with a companion bot that tells GPT-3.5 and GPT-4 what to do. A user tells Auto-GPT what their goal is, and the bot, in turn, uses GPT-3.5 and GPT-4 and various programs to carry out all the steps necessary to achieve whatever goal it has set for itself.
What makes Auto-GPT reasonably capable is its ability to interface with both online and local applications, software, and services, such as web browsers and word processors. For example, for an ad such as “help me grow my flower business,” Auto-GPT may develop a somewhat plausible advertising strategy and build a basic website.
As Joe Koen, a software developer who experimented with Auto-GPT, explained to TechCrunch via email, Auto-GPT essentially automates multi-step projects that would have required round-trips with a chatbot-oriented AI model like, say , OpenAI ChatGPT.
“Auto-GPT defines an agent that talks to the OpenAI API,” Koen said. “The purpose of this agent is to execute a variety of commands that the AI generates in response to the agent’s requests. The user is prompted to log in to specify the role and goals of the AI before the agent begins executing commands.
In a terminal, users describe the name, role, and goal of the Auto-GPT agent, and specify up to five ways to achieve that goal. For example:
- Name: Smartphone-GPT
- Role: An AI designed to find the best smartphone
- Aim: Find the best smartphones on the market
- Goal 1: Conduct market research for different smartphones in today’s market
- Goal 2: Get the five best smartphones and list their advantages and disadvantages
Behind the scenes, Auto-GPT relies on features like memory management to run tasks, along with GPT-4 and GPT-3.5 for text generation, file storage, and summarization.
Auto-GPT can also be connected to speech synthesizers, like the ones from ElevenLabs, so you can “make” phone calls, for example.
Auto-GPT is publicly available on GitHub, but requires some setup and knowledge to get up and running. To use it, Auto-GPT must be installed in a development environment such as Docker and must be registered with an OpenAI API key, which requires a paid OpenAI account.
It might be worth it, though the jury is out on that. Early adopters have used Auto-GPT to take on the kind of mundane tasks that are best delegated to a bot. For example, Auto-GPT can include things like debugging code and writing an email or more advanced stuff like creating a business plan for a new startup.
“If Auto-GPT encounters any obstacles or inability to complete the task, it will develop new prompts to help you navigate the situation and determine the appropriate next steps,” said Adnan Masood, the chief architect at UST, a technology consulting firm. TechCrunch in an email. “Large language models excel at generating human-like responses, but rely on user prompts and interactions to deliver the desired results. By contrast, Auto-GPT takes advantage of the advanced capabilities of the OpenAI API to operate independently without user intervention.
New apps have emerged in recent weeks to make Auto-GPT even easier to use, such as AgentGPT and GodMode, which provide a simple interface where users can enter what they want to achieve directly into a browser page. Note that, like Agent-GPT, both require an OpenAI API key to unlock their full capabilities.
However, like any powerful tool, Auto-GPT has its limitations and risks.
Depending on the target provided by the tool, Auto-GPT can behave in very… unexpected ways. One Reddit user claims that given a budget of $100 to spend inside a server instance, Auto-GPT created a wiki page about cats, exploited a flaw in the instance to gain admin-level access, and took over the environment. of Python that I was in. running, and then “killed” himself.
There’s also ChaosGPT, a modified version of Auto-GPT with goals like “destroy humanity” and “establish global dominance.” Unsurprisingly, ChaosGPT hasn’t come close to bringing about the robot apocalypse, but it has tweeted rather unflatteringly about humanity.
However, arguably more dangerous than Auto-GPT trying to “destroy humanity” are the unforeseen problems that can arise in perfectly normal scenarios. Because it is based on OpenAI language models, models which, like all language models, are prone to inaccuracies, it can make mistakes.
That is not the only problem. After successfully completing a task, Auto-GPT usually doesn’t remember how to perform it for later use, and even when it does, it often doesn’t remember using the program. Auto-GPT also has a hard time effectively breaking complex tasks into simpler subtasks and has trouble understanding how different goals overlap.
“Auto-GPT illustrates the power and unknown risks of generative AI,” Clara Shih, executive director of Service Cloud at Salesforce and an Auto-GPT enthusiast, said via email. “For enterprises, it is especially important to bring a human approach to the loop when developing and using generative AI technologies like Auto-GPT.”