The Development and Future of Artificial Intelligence

Speaker: Zhang Yaqin
Location: Tsinghua University Humanities Forum
Date: December 2025

Today, we are on the brink of a significant opportunity—artificial intelligence (AI), which has ushered in the Fourth Industrial Revolution.

Major Trends in Technological Advancement

First, I would like to discuss major technological trends and the insights brought by AI.

Zhang Yaqin, founding director of the AI Industry Research Institute (AIR) at Tsinghua University. Former president of Baidu, senior vice president at Microsoft, and chair of Microsoft Asia R&D Group.

After hundreds of thousands of years of evolution, the human brain weighs less than 3 pounds and consumes only 20 watts, yet humans are remarkably intelligent. The human brain contains 86 billion neurons with trillions of synaptic connections, and its storage capacity is at least 1 Petabyte.

Our understanding of the human brain is still gradual, and we may know less than 10% of it. Early on, American scientist Paul MacLean proposed the triune brain theory, dividing the brain into different levels: one for physiological functions like breathing and movement, another for emotions, and a higher level for reasoning and decision-making. Although this theory is not precise, it provides an intuitive perspective on understanding the brain. Today, we know that the brain has over 150 functional areas, with 86 billion neurons responsible for various functions like sound, vision, language, and movement. Human memory is particularly fascinating, encompassing innate DNA memory, short-term memory in the hippocampus, long-term memory in the cortex, as well as explicit and implicit memories. Most of human intelligence derives from these different types of memory.

At the 2025 World Internet Conference, an intelligent robotic hand demonstrates fine motor skills mimicking human hand movements. Xinhua News Agency.

Nobel laureate Daniel Kahneman classified human thinking into two systems: System 1 is fast thinking, which generates intuition and quick decisions without deep contemplation; System 2 is slow thinking, requiring deep analysis and reasoning, reflecting higher intelligence. These systems can convert into one another; when we become familiar with something, slow thinking can turn into muscle memory and intuition. For instance, in the early stages of learning to drive, we consciously focus on traffic rules and signals, but with practice, driving becomes a natural behavior—this is the process of system conversion.

AI is essentially the process of learning human intelligence. For years, we have been exploring the essence of intelligence. The term “Artificial Intelligence” was officially defined in 1956, but its theoretical foundations trace back further—British scientist Alan Turing defined “computation” and “intelligence” and proposed the Turing Test: if a machine can engage in conversation to the point where humans cannot distinguish it from another human, it has passed the test. Two other foundational figures often overlooked are Claude Shannon, the father of information theory, who defined bits and information entropy, and Norbert Wiener, the father of cybernetics, who defined feedback, learning, and adaptation—these foundational concepts have been crucial for AI development.

Over the years, various schools of thought have emerged in AI, broadly categorized into two main approaches. One approach, known as the symbolic school, believes that the logic, rules, and reasoning processes of the brain can be represented symbolically. This method results in a beautiful and concise logical system with clear causal relationships, providing transparency in machine reasoning, but its main drawback is impracticality in real-world applications. The other approach comes from the connectionist school, which argues that the brain’s complexity makes achieving intelligence challenging, thus requiring vast amounts of data, experience accumulation, continuous learning, and adaptation through connections with the world. The mainstream deep learning techniques of the last 10-20 years employ this method.

Several milestone events in AI history deserve attention: in 2016, the Go AI program AlphaGo defeated world champion Lee Sedol 4-1. In 2017, Ke Jie, another top player, faced AlphaGo three times and lost all three games. AlphaGo’s intelligence stemmed from deep learning, reinforcement learning, and Monte Carlo search, a remarkable achievement as it learned from hundreds of thousands of human games. However, even more impressive is AlphaGo’s successor, AlphaGo Zero, which learned by playing against itself without human game data, evolving at an astonishing rate. AlphaGo Zero played 100 games against its predecessor and won all 100. It can play not only Go but also chess and other games, leading the DeepMind team to declare they would no longer play against humans, as AI had surpassed human capabilities in all games. AlphaGo and AlphaFold represent a crucial concept—intelligent agents.

At the “AI Mirror—Nanjing AI Ecological Street”, staff demonstrate an AI glasses product. Xinhua News Agency.

Using a similar logic but different algorithms, DeepMind also introduced AlphaFold, solving the long-standing problem of protein structure prediction. What would take humans 10 billion dollars and years of research was completed by AlphaFold in just one year.

In 2024, the Nobel Prizes in Physics and Chemistry were awarded to foundational figures in AI, including DeepMind founder Demis Hassabis, whose team created both AlphaGo and AlphaFold. In January 2025, I had an interesting conversation with him in Davos about new drug development, biological computing, and the future of AI.

Another significant milestone occurred in 2022 with the emergence of OpenAI’s ChatGPT. Previous deep learning or neural networks primarily focused on specific tasks, essentially being advanced pattern recognition technologies like speech recognition, facial recognition, image recognition, or character recognition. However, ChatGPT introduced a new paradigm; it can not only recognize but also generate and create, marking the advent of generative AI.

Generative AI encompasses three critical elements: unified representation (Tokenization), scaling laws, and emergence effects. The most important, in my opinion, is unified representation. How does ChatGPT work? It resembles human neurons: we have 86 billion neurons, each with the same structure regardless of their function—vision, hearing, movement, or memory. Similarly, generative AI transforms all incoming signals into tokens, with the core task of predicting and generating the next token. It can generate text, images, videos, and is now widely used. Additionally, it can create new data, code, mathematical equations, and tools—it can not only generate tools but also utilize them; it can even generate new proteins, molecules, materials, and drugs. When the parameters of large language models exceed the hundred billion level, scaling laws trigger emergence effects, meaning the model’s performance does not grow linearly but leaps as the scale expands, resulting in unexpected new capabilities.

Another important milestone is the emergence of DeepSeek in January 2025. Before DeepSeek, China had over a hundred large models, most of which mimicked the technical paths and algorithmic architectures of models like ChatGPT. Prior to DeepSeek’s emergence, I mentioned that our gap with the U.S. in large models was about two to three years. DeepSeek is a small startup located just 5-10 minutes from Tsinghua University, with many team members being Tsinghua students. DeepSeek represents a new path, innovating in algorithms, technology, and system architecture, achieving capabilities similar to leading U.S. models with just 1% of the computational power. After DeepSeek’s introduction, our gap with the U.S. in large models shrank to about 2-3 months, essentially a version difference, and in some applications, we may perform better. Moreover, it adopts an open-source model, rapidly being utilized by many countries and regions that cannot afford large models, accelerating the deployment and application of the entire model. Thus, I refer to this as the “DeepSeek Moment,” a moment for China.

From Generative AI to Intelligent Agent AI

In 2025, the AI field witnessed another significant transition—from generative AI to intelligent agent AI. Previously, we followed the “scaling law”: more data and stronger computing power lead to better model performance, with emergence effects appearing at certain stages. However, in 2025, we found that the scaling effects in the pre-training phase of language models were slowing down, data resources were becoming saturated, and the marginal returns of increasing computing power were diminishing. In contrast, the importance of the post-training phase was becoming increasingly prominent. This is akin to human growth: pre-training is like the academic phase, where knowledge is accumulated through study, while post-training resembles practical experience after entering the workforce, continually learning and evolving in specific scenarios, which is also the core source of intelligent agent AI.

What is an intelligent agent? As a highly intelligent species, humans can set tasks and goals, plan paths to achieve them, and learn through trial and error, leveraging strong memory to complete tasks. For example, if students want to learn AI, they will consider which teacher to take, compare the best ones, identify reference books, prepare for exams, and decide on practice problems, breaking down the goal of learning AI and finding the best path to achieve it—this is one of our core human traits. When AI intelligent agents learn from human intelligence, they possess three key capabilities:

Autonomous Learning: This differs significantly from automatic learning; autonomous learning has no fixed rules and learns through exploration, whereas automation typically follows predefined rules.

Users ask questions on the DeepSeek mobile app. Xinhua News Agency.
Evolutionary Capability: Through continuous iteration, they can improve, and after evolving, they can apply previously learned knowledge to similar tasks. This is a crucial distinction between human intelligence and that of other species—human intelligence can accumulate over generations. In contrast, species closely related to humans, like chimpanzees, show no significant difference in intelligence between generations.
Generalization Ability: This is the ability to apply learned skills to similar areas. For instance, if a person learns how to book tickets, they can use that skill in other contexts like reimbursement or shopping. Generalization is a human characteristic, but it is also subject to limitations. For example, some students excel in science but may not perform equally well in humanities. I have a friend who is exceptionally smart and does great work, but it took him 15 years to obtain a driver’s license, and he crashed shortly after. Nonetheless, we hope AI can possess this generalization ability.

The realization of these intelligent capabilities relies on a fundamental element—data. The essence of data is digitalization, and our technological foundation is built on digitalization. First, we digitized the information world, followed by the physical and biological worlds. Over the past 40 years, our most important work has been digitalization. This effort began in 1985 with content and document digitization, converting our voice, images, videos, text, and presentations into digital content. Later, with technologies like HTML, we achieved a major milestone—the internet, first with PC internet and then mobile internet. We also digitized enterprises, implementing information systems like ERP, CRM, databases, and various business processes. This phase produced two major outcomes: databases and cloud computing. Now, our physical world is undergoing digital transformation, with cars, roads, traffic lights, and cities being digitized, as well as our power grids, homes, workshops, and factories. The entire physical world is experiencing a digital revolution. Simultaneously, the biological world, including proteins, brains, cells, and genes, is also being digitized.

The director of the MIT Media Lab proposed during the onset of digitalization 1.0 that we are transitioning from “atoms” to “bits.” Now, we are returning from bits to atoms and moving towards molecules—the new generation of intelligence is a fusion of information intelligence, physical intelligence, and biological intelligence, integrating bits, atoms, and molecules, as well as carbon-based life and silicon-based worlds.

Practice of the AI Industry Research Institute (AIR) at Tsinghua University

In December 2020, I founded the AI Industry Research Institute (AIR) at Tsinghua University. The “I” in AIR has three meanings: International, Artificial Intelligence, and Industry. Our mission is clear: to empower industries through AI innovation and promote social progress; our goal is to create an international, intelligent, and industrial research institution for the Fourth Industrial Revolution.

To achieve this goal, the core is to cultivate future technology leaders. We adopt a dual-engine model of “Academia + Industry,” where most teachers possess profound academic achievements and rich industry experience. Currently, the institute has over 20 faculty members, more than 100 postdoctoral and doctoral students, and over 400 interns, making it one of the most active and contributive institutions in the global AI field.

Diverse Applications of Intelligent Agent AI

Next, I will introduce the specific applications of intelligent agents from three dimensions: information intelligence, physical intelligence, and biological intelligence.

Information Intelligent Agents: From Solving Mathematical Problems to Scientific Research

One of the core challenges for intelligent agents is to achieve autonomous, evolutionary, and generalizable capabilities, allowing them to operate across various devices like smartphones, PCs, glasses, watches, and TVs, and be applied in multiple scenarios such as shopping, travel, and enterprise supply chain management. More importantly, we hope intelligent agents can accomplish more advanced tasks, such as solving mathematical problems, inventing equations, and posing new questions.

The team led by Professor Li Peng at AIR collaborated with Professor Shing-Tung Yau’s mathematics research team at Tsinghua University to develop the mathematical intelligent agent AIM. AIM can decompose tasks and complete theorem proofs. For example, in proving the important problem of homogenization in materials science and molecular dynamics, AIM produced a 17-page proof document. This is an excellent example of human-AI collaboration, as feedback from math teachers indicated that the most challenging parts of the proof were completed by AI.

Although AIM’s current proof capabilities have certain limitations, I believe that within five years, AI will be able to independently prove more difficult mathematical problems—such as the seven hardest problems proposed in the millennium (two of which have been solved, leaving five, including the NP-completeness problem in computer science, the Goldbach conjecture, and the Riemann hypothesis). I made a bet with Professor Yau that I believe AI will prove at least one of these difficult problems within five years. Regardless of the specific timeline, the core significance lies in the potential of AI to prove challenging problems, propose new questions, and generate new equations.

Physical Intelligent Agents: From Robots to Autonomous Driving

Unlike current language models, intelligent agents in the physical world must possess vision, language, and action capabilities to construct a “world model.” The system developed by Professor Cao Ting’s team at AIR achieves the core functions of physical world robotic intelligent agents—through perception, reasoning, evolution, actions, and reward mechanisms, it generates decisions and actions to direct robots in completing tasks.

Professor Zhan Xianyuan’s team developed the X-VLA system, which attempts to solve the generalization problem for intelligent agents. Traditional robots find it challenging to transfer learned skills to other robots or different scenarios. The X-VLA system, requiring only 900 million parameters, can be deployed across different robotic arms and machines, achieving skill transfer across devices and scenarios. For instance, if a robotic arm learns to fold clothes, it can still perform the task after changing to a different robotic arm or adjusting the table height, and it can transfer related skills to other tasks like household chores, adapting to the environment entirely through autonomous learning.

Autonomous driving is another significant application of physical intelligent agents and has been a topic of my ongoing interest. Autonomous driving is extremely challenging, requiring vehicles to accurately perceive complex traffic environments, plan paths, and make real-time safe decisions, integrating various core AI technologies; thus, it is considered the “culmination of artificial intelligence.” Significant progress has been made globally in autonomous driving, and the entire industry is transitioning from technological research to commercial deployment.

Biological Intelligent Agents: From New Drug Development to Intelligent Healthcare

AI’s application in the biological intelligence field is first reflected in the acceleration of new drug development. Demis Hassabis mentioned in our Davos dialogue that all human diseases might be cured in the next decade, a viewpoint that may be overly optimistic, but AI can indeed significantly shorten the drug development cycle.

Professor Lan Yanyan’s team at AIR developed a new technology for drug screening, decoding over 20,000 protein structures through AlphaFold to identify “pocket targets,” which are then matched with tens of billions of proteins. Currently, only less than 10% of proteins can be used for drug development, and many protein molecular structures remain unexplored. This technology has achieved a million-fold increase in screening speed through AI algorithms. This research was published in the journal Science.

Professor Nie Zaiqing’s team created a new drug development intelligent agent capable of decomposing tasks based on development needs, automatically searching for information, analyzing protein structures and functions, and generating preliminary development maps, significantly enhancing the efficiency of new drug development and providing crucial support for researchers.

Another breakthrough in AI in healthcare is the establishment of the world’s first intelligent agent hospital—the Tsinghua University AI Hospital (established in April 2025). This is a virtual hospital where roles such as doctors, patients, and nurses are all performed by intelligent agents, covering various departments and forming a complete diagnostic and treatment loop. Intelligent agents evolve through collaboration and competition without the need for manual data labeling. It is essential to emphasize that AI intelligent agent doctors are not intended to replace human doctors but to assist them in improving diagnostic efficiency and accuracy. Currently, this system is being tested in several medical institutions, including Tsinghua University Hospital and Chang Gung Memorial Hospital.

Future Technological Development and Industrial Landscape

The “Operating System” of the AI Era

Next, I would like to discuss future technological development trends, particularly changes in the industrial landscape.

I worked at Microsoft for nearly 16 years, during which I led the development of the world’s largest embedded operating system, Windows CE. An operating system is the most crucial technological platform defining an era; once an operating system is established, the chip, applications, and the entire technology ecosystem are deployed around it. In the PC era, we know the operating system was Windows, with the X86 architecture for chips, and various applications developed around this platform. In the mobile internet era, the operating systems we used were iOS and Android, and in China, Huawei’s Harmony system. The chip architecture changed to ARM, and applications evolved with various mobile apps like WeChat and short videos. In the AI era, large models serve as the operating system. Around this operating system, the chip architecture has shifted to GPU as mainstream, and the application ecosystem has changed. The scale of technology in this AI era is significantly larger than in the mobile internet and PC eras, potentially reaching one or two orders of magnitude larger.

In March 2023, I drew a framework diagram for the AI era: with cutting-edge large models as the operating system, the upper layer encompasses industry vertical systems and SaaS applications, while the end devices (smartphones, PCs) run distilled or compressed smaller models for apps. By October 2025, I updated this architecture, with the core change being replacing SaaS and apps with intelligent agents—I believe intelligent agents will be the future SaaS and apps. While mobile apps will remain mainstream in the short term, intelligent agent functionalities will gradually be integrated into them.

Path to Achieving Artificial General Intelligence (AGI)

Intelligent agents are the inevitable path to achieving Artificial General Intelligence (AGI). Currently, there is no unified definition of AGI; my understanding is that it is characterized by evolutionary capability, generalization, and long-term memory, surpassing 99% of humans in 99% of tasks. To achieve AGI, several critical issues need to be addressed, such as constructing world models that align with physical laws, understanding causal relationships, and optimizing memory systems. Current AI memory is relatively crude and shallow, while human memory is a core complex aspect of intelligence.

Based on this definition, I believe we will reach AGI levels within 15-20 years and be able to pass a “new Turing test.” The Turing test initially focused on text-based conversations but has now extended to various fields. First, in the information domain, I believe we can achieve AGI levels in content generation within five years. Within ten years, we can realize AGI in physical intelligence, as autonomous vehicles have essentially passed the technical threshold, while humanoid robots will require more time. Currently, various humanoid robots perform well, and there are many related studies, including dexterous hands and facial muscle control technologies, but achieving true human-like capabilities will likely take at least another ten years. I am optimistic about this industry, as I believe it will be a massive market. However, humanoid robots are still in the research phase and have not yet reached full-scale production. More importantly, in the biological intelligence field, such as brain-computer interfaces, the integration of biological entities with AI, and the digitization of life forms, I believe achieving AGI in this area will take about 20 years.

From the trajectory of internet development, we began the PC internet era in 1995, the mobile internet era in 2005, and the Internet of Things era in 2015, which is the era of everything being interconnected. Now, I believe we have entered a new era—the intelligent agent internet era, or the Internet of Agents.

There is a particularly interesting concept—Agent Swarm—proposed in 2025, suggesting that future human interactions will occur through intelligent agents, forming collective intelligence through collaboration, competition, and error correction, evolving into structures similar to the neural networks of the human brain, ultimately giving rise to an “intelligent agent economy.” This intelligent agent economy will fundamentally change economic forms, human organizational structures, and enterprise operation models: the core assets of enterprises will become chips, data centers, data, and AI models; team formations will no longer be limited to hiring human employees, as intelligent agents will become an essential component.

Risks and Governance of Artificial Intelligence

We must also emphasize one crucial aspect: while intelligent agents in AI bring tremendous opportunities and capabilities, they also come with significant risks.

Risks include several layers: first, in the information intelligence domain, we have already seen that AI can generate false information, create deep fakes, and sometimes produce hallucinations, which can be used to deceive others. Additionally, there are issues regarding copyright ownership.

As of November 2025, over 50% of the internet information we use is generated by AI. How can we mitigate the risks hidden within this information? This requires collaborative efforts from technology, policy, and regulatory aspects. However, I believe the risks currently present in this field are manageable.

In the physical world, connecting large models, intelligent agents, autonomous vehicles, robots, drones, and military systems poses greater risks if the collaboration and competition among intelligent agents become uncontrollable or maliciously exploited. In the biological intelligence domain, if human brains connect with AI, while this can bring significant benefits, we can also imagine the potential risks of loss of control and misuse. Therefore, we need to research and address these issues. This requires collaboration among scientists, technology developers, product designers, and policy experts to create an effective governance framework that should be global in scope. I am personally confident that humanity, having evolved for so many years, can invent advanced tools while also managing them effectively.

Currently, artificial intelligence is transitioning from discriminative AI to generative AI and gradually moving towards intelligent agent AI. The new wave of AI is a fusion of information intelligence, physical intelligence, and biological intelligence, integrating atoms, bits, and molecules. In this process, we possess astronomical amounts of data and exponential computational power; more importantly, humans and machines will co-evolve, creating enormous industrial opportunities—according to the Davos AI Council, by 2030, new opportunities brought by AI will generate approximately $20 trillion in economic value. At the same time, we must confront a series of social challenges, including privacy protection, security assurance, employment transition, social equity, and risk governance.

Artificial intelligence is opening the Fourth Industrial Revolution. I firmly believe that with strong national power, abundant talent, and favorable policies, China will undoubtedly become a leader in this revolution.