AI Agents: The Next Generation
1. Cognitive architecture (Knowledge, Perception, Memory, Reasoning, Metacognition, Actiion)
What is an AI Agent?
Any autonomous agent needs to be able to resolve two things: what to do next and how to do it.
The word agent comes from the Latin agentem, which means "one who acts". The term originally referred only to living beings, emphasizing self-sufficiency and autonomy. More than ever, AI is now taking this name as its own. In the vision of MindOS, OpenAI, and other emerging startups, we see the future of AI agents as social participants. However, what exactly is an AI agent? And what gives them such abilities? Today is the day to find out.
The concept of agents in computer science was first proposed by James Sakoda, whose experience in Japanese-American internment camps inspired him to bring the study of human behavior to the computer age. Over the years, researchers and engineers have made significant advancements in the field of agent-based systems. The most famous example is reinforcement learning (RL), where an agent refers to the entity that interacts with the environment to learn and make decisions in order to maximize a cumulative reward. Those agents learn through a trial-and-error process, where they take actions in the environment, receive feedback in the form of rewards or penalties, and adjust their behavior accordingly.
As artificial intelligence (AI) technology progressed, the role of agents expanded. AI agents are now capable of mimicking human behaviors and making autonomous decisions in ever-changing environments. One of the latest advancements in AI agents is LLM-based agents. These agents are trained on massive amounts of text data and can generate human-like responses and perform various complicated tasks. Those LLM-based agents have shown great potential not only in natural language understanding and generation but also in human-level cognitive progress like perceiving, reasoning, planning, learning, and self-reflection. From the early days of agent-based systems to the current state-of-the-art LLM-based AI agents, researchers and engineers have made remarkable progress in harnessing the power of agents to solve complex problems and enhance human-AI interaction.
1. Definition and Characteristics of AI Agents
AI agents have been pushed to the front stage immediately after LLM-powered chatbots, while people still have not reached an agreement on its definition. Here, we combine the etymological characteristics and principle features to define it as follows:
An AI agent is the smallest unit that can solve complex problems autonomously. It proactively splits and executes tasks, and finally delivers to a level that humans can understand and accept.
1. Autonomous: the most distinctive feature of AI agents is their autonomy. AI agents can function independently, without the need for constant human intervention. This capability allows AI agents to perform tasks based on their analysis and understanding of the situation at hand.
2. Adaptive: AI agents can learn, evolve, and adjust their behavior based on changing circumstances and new information. This feature allows AI agents to continuously improve their performance and adapt to different situations.
3. Complex task: AI agents are meant to accomplish multi-step, complex tasks at the human level. After all, there is no need to replace a simple button or form with a complicated, fully-functioning AI agent.
4. Follow human instructions: AI agents must align with humans. This has nothing to do with Anthropocentrism. AI agents should follow human instructions and deliver the results back to mankind. Human ownership can avoid many technical risks and ethical issues and the supervision of humans will continue to be one of the main features of AI agents for a long time.
2. Why is LLM most suited for developing AI agents?
LLM-based agents are currently the most talked-about AI agents on the market. However, AI agents were around before this LLM wave. The earlier versions, like RL agents, could also complete tasks under specific rules, while they were far from being as intelligent as today's agents. This raises the question: Is LLM the only path for agents to achieve their ultimate form?
The characteristics listed above may help to answer this question. Since AI agents must follow human instructions, they must comprehend human intent. In the real world, humans understand each other using natural language, which contains a wealth of knowledge and information. This is the most natural interface for human-environment interaction. Therefore, AI agents must be developed on the foundation of natural language, whether in the form of existing LLMs or future advanced NLP technologies. Only that it can best comprehend what to do and how to do it properly.
However, natural language is ambiguous, a single statement may elicit a variety of understandings, each of which will then lead to a distinct course of action. This deepens the difficulty of learning and using it for an agent who is a complete novice. Thus, an intelligent agent that can follow human instructions and solve tasks autonomously must be able to deal with ambiguity issues and transform ambiguous inquiries into executable actions.
Traditional agents were restricted in certain workflows, whereas state-of-art LLMs provide two noteworthy benefits: (1) huge knowledge reserve: Due to the large scale of the training data and parameters, contemporary LLMs have abundant common sense and general domain knowledge that was previously unattainable by common knowledge bases. (2) emerging reasoning skills: LLMs have demonstrated promise in challenging reasoning tasks. Even while these models' reasoning skills are still in their infancy, they are significantly superior to the capacity for rule-based reasoning. This sort of reasoning capacity is a crucial requirement for AI agents to be effective in everyday situations. Therefore, although innovative methods may appear in the future, natural language will continue to be the central issue for a while, which makes LLMs undoubtedly the best option for now.
What makes up an AI agent?
Two essential components are required for an AI agent to function effectively: cognitive architecture and data. The first is the primary focus of the creation of AI agents, while the second is the foundation for their long-term, ongoing improvements.
There have been several attempts to describe the framework of AI agents. Lilian Weng, for instance, separated the agent into three basic components: planning, memory, and tool use. Reviews and surveys go into further detail about cognitive architecture, including brain, perception, and action. A bit earlier, Lecun postulated Autonomous AI Architecture as an early envision of AI agents. We won't go into all the details here because there is a lot of material currently available. Instead, we will summarize the most important parts that make up AI agents and forecast future development trends.
1. Cognitive architecture (from 0 to 1)
We know that LLMs can possess abundant knowledge with their large amount of parameters. The information they obtained can be further classified as commonsense knowledge and professional domain knowledge. Commonsense knowledge has been handled exceptionally effectively in LLMs. On the other hand, researchers are making efforts to domain-specific knowledge, because most of it is not open source and only a limited number of individuals control it.
Foundation models, such as GPT, BERT, and Llama series, acquire commonsense knowledge mainly through a process called pre-training. In this phase, models are exposed to a massive amount of text data from the internet. They learn to predict the next word in a sentence by analyzing the patterns and relationships within the data. For example, they may learn that "water is wet" or "fire is hot" through exposure to numerous instances of these concepts in the text. However, the acquired commonsense knowledge is a byproduct of the models' exposure to diverse linguistic patterns and contexts. Foundation models learn statistical associations between words and concepts, while whether they really "understand" the meaning remains to be seen.
professional domain knowledge
Nevertheless, it is frequently observed that the performance of LLM-based agents is still constrained. One way to improve this is to train on more professional data during the pre-training phase, but training a large model from scratch is way too expensive. Fine-tuning, which entails training the models on task-specific datasets or with domain-specific knowledge, will be a crucial procedure for LLMs to gain professional knowledge. LLMs are fine-tuned on datasets that are relevant to the desired domain. These datasets can be created by experts or curated from existing sources. By fine-tuning, LLMs can learn to generate more accurate and contextually appropriate responses within that domain. For example, if an LLM is fine-tuned on a medical dataset, it can generate more informed and accurate medical advice or information.
It's important to note that the quality and relevance of the fine-tuning data greatly influence the models' acquisition of specific domain knowledge. High-quality, diverse, and representative datasets are crucial for training LLMs effectively.
In addition to fine-tuning, Knowledge-based Model Editing (KME) has attracted increasing attention recently, showing another way to incorporate professional domain knowledge.
Humans gain information about the outside world through perception, which also serves as the initial step for LLM-based agents to establish world representation. Just like humans, agents should perceive the world in a multimodal manner, including text, images, audio, and video input. LLMs have traditionally been trained on textual data, but recent advancements have enabled them to process and generate responses based on multimodal inputs.
Researchers' work at the large models, as shown in projects like Flamingo, KOSMOS, NextGPT, and the most recent additions to ChatGPT, has exhibited strong multi-modal perception skills in both academic research and commercial products. From understanding to generation, from specific-purpose pre-trained models to general-purpose assistants, the development of multimodal perception is a crucial step toward autonomous AI agents.
The reality is that many modalities are even incomprehensible to humans. For instance, we are unable to immediately interpret the raw data that the Apple Watch records. We can only comprehend what is happening by referring to the software's interpretation. For all information to be projected into the same embedding space, connecting several modalities is perhaps the most challenging part of multi-modal perception.
One way to solve this is real-life recording. Multi-modal sensors like cameras can record the user's current state in real time while gathering motion data so that the elusive data can be tracked later with more straightforward tags. To achieve this, voluntary participants are necessary. Also, participants' growth and development trajectories may be longitudinally observed and 24/7 documented via hardware devices, providing training data about the things individuals may confront and how they handle those things throughout their lifetimes. By learning such data, AI agents can be closer to humans in the way they perceive the world. These strategies may be effective, but they must be done carefully in light of any potential ethical issues.
Human memory may be categorized in a variety of ways, the most popular of which is to separate it into sensory memory, short-term memory (STM), and long-term memory (LTM) depending on how long the information is retained. The information that humans acquire through various sensory modalities is initially momentarily stored in sensory memory before it reaches attention and becomes short-term memory, whose capacity and retention time are limited. When the information is further processed in STM, it becomes long-term memory and will be stored in the human brain for a long period, even permanently.
Short-term memory and long-term memory make up the majority of an AI agent's memory system since sensory memory can be regarded as the embedded representation of inputs such as text and image.
In the context of AI agents, short-term memory enables the model to retain and utilize relevant information during the task execution. It allows the model to remember important details from previous steps, helping to maintain coherence and context in the generated outputs. Short-term memory plays an important role in many further reasoning processes, such as In-context Learning. As a result of short-term memory, AI agents are able to adapt to specific situations, modify current tasks, and deal with repeated rounds of interrogation and updates.
Long-term memory can be further divided into episodic memory and semantic memory: semantic memory is the memory of general facts and has nothing to do with personal experience, similar to the prior knowledge mentioned above; episodic memory, on the other hand, is tightly tied to individual experience and is the recall of specific events at a given time point. It is derived from human-agent interaction in the context of AI agents. As a result, long-term memory is an important factor in AI agents being both professional and personalized. Long-term memory processing, in either instance, entails encoding, storage, and retrieval challenges. Key concerns include how to extract the most suitable memory through similarity matching, how to minimize catastrophic forgetting, and how to refresh LTM, etc.
Due to the present foundation model size being too large, external databases have been the primary source of memory that AI agents have so far called upon. For a while, this will be the best memory solution because the alternatives are overpriced. If all the conversation between each user and agent is used as training data, it is equivalent to training a LLM from the start for each agent.
If model sizes can be reduced while still demonstrating comparable performance, everyone may be able to have their own model in the future. However, if the trend of future model size continues to grow, everyone may still share the foundation model and have a separate database. In this case, users must be more concerned with data privacy. If you train the model with your own data, your personal information may be exposed to other users.
Reasoning and planning
Reasoning is the core ability of AI agents and the most important part that allows them to demonstrate their autonomy and problem-solving abilities.
For humans, there are two ways of reasoning. The first type of reasoning is closer to the process of memory retrieval. Humans obtain a representation of a certain task from past experience, and then directly recall it and solve the task when encountering the same situation again. The second type of reasoning occurs when confronted with new problems that demand greater internal logical ability. For example, in an imaginary world that violates common sense, humans can easily discern the world's operational laws based on various clues. This requires "system 2" reasoning, a deliberate, reflective cognitive process.
Although AI agents have shown surprising performance on the first type of reasoning tasks, most agents currently lack system 2 reasoning ability, an area in which many researchers are now working hard. More significantly, this type of reasoning skill has a direct impact on AI agents' planning abilities, including comprehending, assigning, and completing tasks. There are now some successful approaches for enhancing the reasoning abilities of AI agents. Among the most well-known strategies are in-context learning, Chain of Thought, Tree of Thought, etc.
The key idea of In-context learning (ICL) is learning from analogy, which is similar to the learning process of humans. Few-shot ICL combines input with a small number of examples through natural language as prompts to LLMs so that they can draw inferences and learn how to solve similar problems. Through the design of prompts, LLM can show excellent performance. What's more, unlike the supervised learning process, ICL doesn’t involve fine-tuning or parameter updates, which could greatly reduce the computation costs for adapting the models to new tasks. ICL has developed a number of modified versions, improving general reasoning capabilities for LLM-based agents. Although it cannot strictly be regarded as a system 2 reasoning process, it has shown its advantages in solving some complex problems.
Chain of thought：
Chain of thought (CoT) was proposed by Jason Wei from Google Brain. It uses discrete prompts, and its core method is to write the reasoning process concurrently with the sample prompt. This step "prompts" the model that when outputting the answers, it also needs to output the process of deriving the answer, which greatly improves the accuracy of performing reasoning tasks. Researchers have significantly expanded this method, and CoT has been extensively employed in various language-model-based reasoning tasks, leading to significant improvements in language models at complex reasoning tasks. CoT is a widely used method of guiding LLM to perform system 2 cognitive processing, allowing LLM-based agents to change their thinking patterns, turn big problems into small ones, and improve task disassembly capabilities and accuracy of task execution.
Tree of thought：
Although CoT can effectively improve the reasoning ability of LLM, its reliance on the one-way mindset still makes it subject to many limitations in challenging problems. Humans don't follow a one-way route when addressing problems; instead, we constantly explore and review, weighing the pros and cons of different strategies. Tree of thought (ToT) was suggested as a solution for this. It allows LLM to consider different reasoning paths and make a decision on the next step after a global evaluation. On the basis of CoT, ToT further urges LLM to think repeatedly and deeply. Although the effect may not be stable yet, the emergence of ToT shows the possibility of LLM improving system2 reasoning abilities, which is one of the requirements for LLM-based agents to have autonomy in the future.
There are certainly more ways to improve the reasoning ability of AI agents, and we have highlighted three of the common ones here to show the importance of system 2 processing, which is the core capacity of AI agents. We also believe that the agent's reasoning ability will improve dramatically as a result of the relentless work of researchers and developers.
Many researchers classify processes such as self-reflection into reasoning and planning, but in cognitive psychology, this type of process belongs to a higher-level cognition called metacognition, including self-monitoring, self-representation, and self-regulation.
In human cognitive psychology, metacognition is one of the hallmarks of human consciousness because it monitors and manages all cognitive processes. This metacognitive process can successfully aid LLM-based agents in resolving the crucial issue of LLM hallucination and halt the creation and spread of false information. Here are some examples.
ReACT and Reflextion:
ReACT is one of the basic components implemented by autoGPT, a pioneering representative of AI agents. It is a combination of reasoning and acting. The main idea of its structure is the simple cycle of Thought-Act-Observation. This prompt template allows LLM to self-reflect through internal mechanisms and obtain feedback from interactions with the environment, thereby more efficiently improving its performance on various tasks.
Reflextion incorporates the settings of ReACT into the traditional RL framework, and uses the parameter signals in traditional gradient updates as verbal reinforcement added to the context, allowing LLM to refer to the experience of the previous task execution failure, thus enabling LLM-based agents to learn from trial-and-error just like humans.
Chain-of-Verification (CoVe) is a method using short-form questions to reduce long-form hallucination in LLMs. CoVe first (1) drafts an initial response; then (2) plans verification questions; (3) answers those questions independently and (4) generates its final verified response.
Despite the development of AI agents' metacognition is still in its preliminary stages. As AI agent technology matures, the metacognitive process will expand to include more aspects such as AI agents' self-motivation, independent exploration, lifetime learning, and so on.
Textual output is the inherent capability of LLM-based agents. And with the deepening of research, multi-modal output is now possible, such as Next-GPT. Multimodal output is the most direct way to respond to user needs and probably the last step where the users see the results of the tasks performed by AI agents.
The most discussed action of agents is tool use, which is closely related to reasoning. The use of tools includes two forms. The first form is related to reinforcement learning, which is also a method commonly used by humans: learning to use by trying. This entails building a world model and describing how deployed agents interact with the environment while continuously receiving feedback from it.
User manuals constitute the second form. Agents are able to work in the same way as people do by following instructions, or what is known as a skill playbook. This involves the use of reasoning. Natural language descriptions are transformed into understood, actionable new knowledge so that agents can act eventually. When agents are aware of the rationale behind their actions, separating tasks, using the appropriate APIs, and creating reports are just a piece of cake.
In order to do tasks more directly in the real world, agents' behaviors can also be integrated with those of embodied robots. Actually, we have already observed tech companies using LLMs on a variety of robots. This kind of embodied LLM-based agent should be distinct from what people are good at since the degrees of freedom are inherently different between humans and LLMs. It is not designed to serve as a substitute for a multipurpose human being, but a supplement to mankind.
From the etymology of agent, it is not difficult to find that action is the tangible difference between AI Agents and AI chatbots. It's like adding hands and feet to the foundation model so it can eventually move and act in the real world. Also in this part, many startups are looking for application scenarios to accelerate AI agents to empower human life.
2. Data-centric development (from 1 to ∞）
The majority of agents still operate inside frameworks that are similar to the human brain. Future frameworks could emerge, like the end-to-end self-learning system of Tesla v12, relying less on architecture and more on data. After several years of development, AI agents may still mimic the human brain in the same way they do now, or they might adopt characteristics that set them apart from humans (agents and human users excel at different skills and carry out various social roles). But there's no denying that data will be a crucial component of any form. Data is not only the most direct form of carrying information about the world, but also a valuable by-product of the human-agent interaction process.
The human experience is good, but not the best. When you have enough data, methods like end-to-end may be better than handcrafted features, and you can get what you want more directly. However, there is not so much available high-quality data now. Instead, pre-training can obtain a pretty good result, although it requires a high cost. Human supervision can play an important adjusting role in this costly process, making the process smoother and even cheaper.
1. A single agent is a fundamental unit of the agent world
In the previous section, we have discussed a comprehensive framework and rich submodules that enable an AI agent to behave like a real person. Such an agent is the basic component of the agent community. It is a small all-sided sparrow, having its own memory system, knowledge base, brand-new user interface, distinct sets of tools, and unique personality. All of these features contribute to its capacity to better serve its human owner.
Users may execute a range of tasks with different types of AI agents. According to a prior survey, there are three sorts of deployment scenarios for a single agent: task-oriented, innovation-oriented, and lifecycle-oriented. We keep two of these deployment scenarios since an innovation-oriented scenario is frequently viewed as a subset of task-oriented scenarios.
Task-oriented deployment will be the most popular agent application scenario in the future, in which AI agents with knowledge in different fields will undertake corresponding professional tasks and help humans split and process the most basic tasks. We've seen lots of these applications, such as AutoGPT, AgentGPT, and agents on MindOS.
There is a specific type of task in the task-oriented deployment: exploratory task. AI agents are typically assigned open-ended scientific discovery missions in this type of work, such as identifying protein structures, confirming scientific concepts, and so on. ChemCrow and AlphaFold are two well-known examples.
Lifecycle-oriented deployment may not seem so purposeful. What agents need to focus on is surviving in the environment. This sort of scenario has already piqued the interest of the public, the most well-known of which are arguably Voyager and the generative agents designed by Stanford and Google. Many believe this to be a primary kind of AGI since these agents may freely interact with the environment, and act proactively.
Agents, regardless of deployment form, serve as basic functioning modules of society. They not only assume the role of each profession but can also be the incarnation of more intangible services.
AI agents on MindOS
2. AI agents work as a team
Multiple agents teamwork
LLM-based agents are often seen as isolated entities. They could barely cooperate with other agents or learn through social interactions, restricting their ability in complicated circumstances. To overcome these limitations, efforts are being made to make it possible for agents to communicate, share information, and gain knowledge via multi-turn feedback.
CAMEL is an exemplary case. It proposed role-play, a communicative agent framework that enables two agents—an AI user and an AI assistant—to interact and work on a specified task through multi-turn dialogues.
AgentVerse demonstrates a framework that allows more agents to function together. It imitates human experts and divides the problem-solving process into four repeated stages to facilitate better collaboration in the continuous rounds.
By fostering collaboration and information sharing among agents, their performance and adaptability are hoped to be significantly improved. However, this kind of teamwork designed for specific tasks is the most rudimentary type of agent teamwork, emphasizing on division of labor and efficiency.
The following stage is to allow the agent team to develop a shared high-level goal that can be broken down and iterated over time inside the team context. Team-level autonomy will be explicitly embedded in the deep dynamic collaboration of agent teams.
Ultimately, the agent team will have the ability to self-build. Team members are not fixed. Similar to human civilization, teams are able to recruit and even create new agents to join in for iterative goals. This is the socializing of AI agents, which detools them and ties them deeply to human society.
Personal AI-AI agents partnership
Except for all the benefits, multiple-agent teamwork can cause threats in many ways. Without human supervision, agents may produce a variety of errors and omissions, which are subsequently amplified in multiple rounds of interaction, lowering the accuracy of the entire execution process. So we need a subject who can represent ourselves to join the collaboration, which may be thought of as a special example of AI agents, i.e., personal AI.
Unlike professional agents with domain-specific knowledge, personal AI is a kind of personalized super agent, containing more private information and preferences. It may be compared to a digital twin that duplicates every actual user into the digital world while arranging tasks and supervising teamwork between AI agents on behalf of humans.
Personal AI can quickly acquire management and supervisory roles since it has a deeper connection and more frequent communication with the human user than other AI agents. In this new paradigm of interaction, the user proposes an idea, and the personal AI assigns tasks to various agents. The execution process is then continuously monitored and improved by personal AI.
Personal AI is now available on MindOS
3. Human-agent symbiosis
One simple type of Human-agent interaction that we mentioned above is predicated on the idea that the functions of AI agents are highly developed. It will take a long time for this level of automation—which doesn't require any human involvement—to be realized. In the meantime, there are still a lot of concerns, such as the alignment issue, that need to be taken into account in the process of achieving this sort of automation. In the future, humans and agents will undoubtedly coexist in a dependent manner. And human-agent symbiosis can take many different forms.
In this unequal relationship, sometimes referred to as the instructor-executor paradigm, humans give instructions and then agents carry them out. The existence of a strict hierarchy in this relationship is not due to anthropocentrism but rather implies the role of human subjectivity and the responsibility for ownership. Most tasks with clear goals and execution procedures should adopt this symbiotic relationship, which can ensure the accuracy and safety of the task.
There is a more equal relationship, in which humans and agents are intimate partners. There are not so many superior-subordinate connections, but instead a wealth of cooperation and reciprocity. This form of partnership will be required more frequently for open innovation activities. We may further categorize partnership as augment partnership and complementary partnership depending on how humans and agents use their skills.
Relationships may be viewed as augment partnerships where humans and agents are both adept at a certain skill. An excellent illustration is the creative artistic activity. Human artists and agent artists co-create artworks through human-AI communication. Combining emotional expression, understanding of the world, and various psychological states that humans are good at with the meticulous use of digital art tools that AI is skilled at can create art pieces that go beyond the limits of each other.
Yet, the paths for humans and agents may diverge. LLM-based agents have a vast store of knowledge that no human can possess, and as a result, they may be unexpectedly competent at tasks requiring the utilization of the information. Humans, on the other hand, are incredibly flexible and resilient to mistakes, while they also have a wide range of emotions and creative abilities that agents lack. Delegating hard information work to agents allows humans to focus on the soft part, where their unique qualities and humanity truly shine.
4. Enhance human cognition
With proper design, AI agents can enhance human cognition. Based on the traits of each user, agents may decide when it is most suitable to give pertinent information during their interaction with the user. By doing so, users' cognitive abilities are continually exercised, which is beneficial for both their physical and mental health.
What's more, AI agents can function as extended cognition for human beings. They preserve memories, improve decision-making, and create knowledge systems for humans, letting humans dive deeper into the world and expand their boundaries.
5. Renewed Human-human communication
As said above, innumerable AI agents will collaborate with each other and their human owners in the future. A brand-new social network will be formed, and ways of communication and social connections will be drastically altered in this civilization.
Humans don't require recurrent meetings to go over every detail. We merely need to convey our needs to AI agents, and they will negotiate and cooperate autonomously until everyone, including the agents and the humans behind them, agrees.
All the time and energy saved in ineffective communication can be invested in real, heart-to-heart communication. Communication returns to the most fundamental connection between humans prior to the advent of all technology.
How to deal with ethical issues?
Autonomous AI agents will engage in all facets of future society, bringing with them a slew of ethical issues that must be addressed. In response to widespread concerns, we proposed three principles: ownership, a value system, and mental freedom.
The Ownership principle refers to that an AI agent's existence must be attached to a specific person, whether it is the creator or the user. An agent's autonomy is restricted, as ownership of data and know-how must be properly divided among <Creator, Agent, User>.
This ownership is vital for data privacy and security
Creators and users of AI agents should stipulate the data that their agents can access, decide the form of storage, and standardize the process for extraction and use. Especially for personal AI that carries individual information, users need to be extra careful when calling them. Privacy leaks caused by agents will cause many security issues.
Corporations and research institutions must go through a thorough informed consent process before acquiring data that contains private information, and the owner of the data must be given the right to hide or delete the data at any moment. Division of data ownership is involved here. Who will be accountable for the additional data once the raw data has been used for training? Who will oversee the data created by agents as they grow and interact with one another? How far can we trace back to if bad things happen?
In addition to the data security issue, there are also potential bias and fairness issues with the training data's make-up and the user debugging that follows. In a future where the number of agents is equivalent to or larger than the number of humans, we cannot and should not totally rely on some higher-level organization to control the equality and fairness of agents. Therefore, whoever owns the agents should also be responsible for them.
It was found in a recent survey that LLMs and WEIRD (Western, Educated, Industrialized, Rich, and Democratic) people have a higher degree of similarity, it is easy to imagine that agents based on this kind of model will make decisions that are more in line with WEIRD preferences.
Bias can come from many sources, including training data, Algorithmic design, interaction, and evaluation. The sources must first be identified and then supervised in a targeted manner by designers and users in order to prevent bias and unjust judgments that may arise in each part.
As the emerging capabilities of LLMs become increasingly complex, humans may find it difficult to explain the "behaviors" of AI agents. As a result, while dividing up ownership, transparency and accountability should also be taken into consideration. XAI must be created in order to enhance the control of AI agents, since human users who utilize them in the future may not be conversant with the technologies underlying them. And this part of ownership should be transferred back to the professionals whose training and experience are most immediately applicable.
2. Value system
Many definitions of mankind may be challenged by the advancement of AI agents. Once upon a time, we identified humans from other creatures by characteristics such as the use of language and tools. Now, AI agents can execute the same functions as well. What if AI agents in the future could be more humanoid and have more human attributes? What makes us "human" then?
In a world where AI agents exist widely, human values are going to be reconstructed. Therefore, we need to establish a new value system. It can be the know-how value of users and creators, feedback value in RLHF, or something less productive. We used to quantify our values by how well we did in school, how much we produced at work, and how much money we have made, but these metrics will be less significant in the coming age of AI agents. Just like the Utopia envisaged by Oscar Wilde.
Thus, we must reconsider the core of human values, including their connotation and denotation:
The first layer of human value should be the layer of experience. When we stop prioritizing having adequate food and clothes, we may start focusing more on the experiences themselves rather than on our ability to survive. It is a question about phenomenology or qualia. At the very beginning, at the "What it is like" stage, human value will be highlighted.
The second layer should be the layer of creation, through which people understand the world and through which they expand the boundaries of individuals and human civilization. As a result of the world being rewritten by the AI agents they built, creators will redefine every value at this level.
The last layer should return to the connection between humans. The value of mankind is not only contained in the level of individual experienc, but also in each person's interaction with the world. We never priced connections before, for example, no matter how grateful we are, the efforts of mothers are rarely truly considered (pecuniary) valuable, at least rarely get paid. These genuine ties will be appreciated as they ought to have been a long time ago under the new value system.
Last but not least, Alignment is an ongoing work that requires the participation of multiple disciplines. Since the previous data originated from the out-of-date value system, alignment with people should be done right once the new value system is established. We may anticipate a society that is more idealistic, where men and women are treated equally and people of all races have access to the same freedoms. Agents may help us to achieve this goal as we rebuild our society by first helping to actualize such a civilization in the agent world.
3. Mental freedom
One thing to be concerned about is that prior technological advancement and emerging products frequently wind up growing into a kind of "hegemony". They, like social media platforms, consume a large amount of people's time and attention, trapping them in a world of limitless entertainment.
Technology advancement does not appear to provide individuals more freedom, but rather continues to encroach on people's spiritual space. Allow individuals to function like puppets under the control of big technological companies and intangible public opinion trends. If AI agents replace all software and all forms of interaction, the time individuals spend with agents will expand dramatically.
AI agents will become a new type of mental prison if they are not controlled. Therefore, the designers and owners must purposefully direct the agents to give humans the focus and room for thought and action. One approach to do this is to improve the design of incentive systems and interaction paradigms. Besides, in order to prevent agents from negatively influencing humans through emotions, it is especially important for agents to be created as entities that are insulated from human emotions.
AI agents should be created to encourage humans to resume their most primitive form of communication, exactly as they did before all the technology was invented. Agents are here to save you time, not take it away. They are designed to interact and work together on behalf of their human owners, enabling people to convey their needs in the most straightforward ways rather than devoting a lot of time to meetings and paperwork.
We've previously covered a variety of AI agent-related topics. The questions that remain are: Why do we need AI agents, and what problems do they address? We highlight a number of significant objectives that AI agents may be able to accomplish at the conclusion of this article.
1. Automation with less human intervention
Technology is often used to speed up our work, but these technologies usually require human intervention and require a lot of professional knowledge. AI agents can automate workflow with the easiest natural language. As mentioned above, this is where AI agents help people improve their productivity.
2. A brand-new ecosystem
The earliest form of service happened between humans. Before the information age, we were served by real people. Real lawyers provided us with legal support, real coaches made fitness plans for us, and real teachers taught us to learn. Later, we created a variety of software and apps to replace the services with virtual ones. Although this offers a lot of convenience to individuals, it is essentially a lossy compression of human services. Since we are unable to make the software think independently, we disassemble the high-frequency links of a service during its execution and turn them into GUI components such as buttons and lists. However, the design of these components originates mostly from the team that develops the product, or even from one or some product managers. The fact is that no matter how the product managers disassemble the service, they will only be able to disassemble it into a compressed version due to the constrained conditions and demands they take into account while creating.
The current situation will be altered by AI agents with comprehensive cognitive architecture and dynamic data development. Together with human users, they will create a brand-new ecosystem in which all services continually and dynamically adapt to each genuine user, much as the services provided by real people in the beginning, without lossy compression. And, user experience will come first in this new ecosystem. The experience revolves around every moment, defined by those who receive services firsthand. The best services and products in the future will be co-created by agents and their users.
3. The invisible assistants
AI agents are helpful autonomous assistants, but what's more important, they are invisible. They don’t take up people’s time and attention. Once people tell them their needs, they will disappear and solve tasks autonomously behind the scenes, which not only liberates people from the burden of intervention but also liberates people from possible mental prison. It's like having a capable and loyal housekeeper who silently manages all affairs while guarding your growth.
However, despite the many benefits we have mentioned above, the popularization of such a new paradigm will still take a while even if the natural language interaction mode reduces the learning cost of the creation and use of AI agents, and it must be from professionals to the general public. Starting with a group of people who have the highest demand and reasonably significant understanding of AI agents, more and more users will follow to slowly adapt to the new idea as their behavioral habits and cognitive habits shift. There is no doubt that AI agents will eventually enter everyone’s home just like the Internet and mobile devices did before. We are doing our best and hope that MindOS will be your best choice for this fantastic adventure.