EarthLink: A Self-Evolving AI Agent System for Climate Science

  • 1Shanghai Artificial Intelligence Laboratory
  • 2Fudan University
  • 3Institute of Atmospheric Physics, Chinese Academy of Sciences
  • 4The University of Sydney
  • 5Shanghai Jiao Tong University
  • 6Nanjing University of Information Science and Technology
  • 7Seoul National University
  • 8Ocean University of China
  • 9Laoshan Laboratory
  • 10Columbia University
  • 11Japan Agency for Marine-Earth Science and Technology
  • 12The Chinese University of Hong Kong
  • 13East China Normal University
  • 14Chinese Academy of Meteorological Sciences
  • 15Tsinghua University
  • Equal contribution
  • * Corresponding authors.

Abstract


Modern Earth science is at an inflection point. The vast, fragmented, and complex nature of Earth system data, coupled with increasingly sophisticated analytical demands, creates a significant bottleneck for rapid scientific discovery. Here we introduce EarthLink, the first AI agent designed as an interactive copilot for Earth scientists. It automates the end-to-end research workflow, from planning and code generation to multi-scenario analysis. Unlike static diagnostic tools, EarthLink can learn from user interaction, continuously refining its capabilities through a dynamic feedback loop. We validated its performance on core scientific tasks of climate change, ranging from model-observation comparisons to the diagnosis of complex phenomena. In a multi-expert evaluation, EarthLink produced scientifically sound analyses and demonstrated an analytical competency that was rated as comparable to specific aspects of a human junior researcher's workflow. Additionally, its transparent, auditable workflows and natural language interface empower scientists to shift from laborious manual execution to strategic oversight and hypothesis generation. EarthLink marks a pivotal step towards an efficient, trustworthy, and collaborative paradigm for Earth research in an era of accelerating global change. The system is accessible at our website https://earthlink.intern-ai.org.cn.

Workflow


Description of the image

The EarthLink workflow for automated climate research. a, The Planning Module generates multiple candidate plans based on user requests and literature, utilizing a Plan Aggregation Agent to synthesize an optimal plan with optional human supervision. b, The Scientific Diagnosis Module executes the plan via a Coding Agent. It incorporates a self-correction loop where Result Checking and Image Feedback Agents ensure autonomous debugging and code refinement. c, The Multi-Scenario Analysis Module interprets the results and generates comprehensive reports to support decision-making across domains such as energy, agriculture, and environment. d, The Resource Libraries underpin the workflow, providing essential knowledge, data, and tools to support the agents across all stages.

EarthLink understands biases between model simulations and observations


Description of the image

Multi-level evaluation of EarthLink on a number of core climate analysis tasks. a, Level 1: Multicomponent statistical feature comparison. EarthLink conducts diagnostic analyses across domains by comparing the CMIP6 simulation of climatological features, such as spatial patterns and variabilities, with observations. Examples include seasonal cycles of precipitation, cloud radiative effects, global temperature change, ocean heat content (OHC) timeseries, 20°C isotherm depth, Arctic ice climatology, Antarctic surface albedo, and runoff. b, Level 2: Mechanistic diagnosis. EarthLink estimates scenario-driven metrics such as equilibrium climate sensitivity (ECS) and transient climate response (TCR), demonstrating its ability to extract relevant datasets and implement standard diagnostic methods. c, Level 3: Physical process diagnosis. The system performs advanced analyses such as ENSO diversity classification and period detection, displaying emergent capacity in physical reasoning and chain-of-thought synthesis. Note that most of the image elements are directly produced by EarthLink, and the others are only slightly adjusted in layout.

EarthLink understands climate change under different scenarios


Description of the image

Application of EarthLink to understand climate change under different scenarios. a, Climate change detection, attribution, and future projection. EarthLink processes multi-model CMIP6 simulations under various experiments, accurately distinguishing between the effects of natural and anthropogenic forcings and generating global temperature anomaly timeseries. b, Constrained projections of future surface temperature for selected regions. Using hierarchical emergent constraints (HEC) and spatial aggregation approaches, EarthLink reduces projection uncertainty for city-level temperatures under the SSP2-4.5 scenario (2041–2060). c, Constrained projections of future temperature changes in Africa using constraining factors automatically identified by EarthLink. Note that most of the image elements in \textbf{a}–\textbf{c} are directly produced by EarthLink, and the others are only slightly adjusted in layout. d, Differentiated task scorecard. The system's performance across evaluation tasks is summarized, highlighting relative strengths in planning, coding, and visualization. The maximum score for each item is 5.

EarthLink understands hidden mechanisms of climate phenomena


Description of the image

EarthLink's autonomous research process for improving Atlantic Niño predictability. a, User request and experimental plan. The user tasked EarthLink to enhance the 8-month-lead forecast skill of the summer Atlantic Niño index (ATL3). The system independently designed a full experimental plan covering precursor selection, model setup, validation protocol, etc. b, Main visualization results. EarthLink executed the experimental plan, producing visualizations of the (top) correlations between 8-month-lead predictors and the ATL3 index, (middle) a time-series comparison of observed values against hindcasts from the multiple linear regression (MLR), random forest (RF), and gradient boosting (GB) models, and (bottom) scatter plots evaluating the hindcast skill of each model. Most of the image elements are directly produced by EarthLink, and the others are only slightly adjusted in layout. c, Result analysis. EarthLink integrated the diagnostic outcomes into a concise mechanistic summary, outlining the physical processes that link the identified precursors to the subsequent development of the Atlantic Niño.

Sensitive analyses of EarthLink performance and self-evolving mechanism


Description of the image

Comparison of different foundation models and self-evolution rounds. a, Mean success rates of diagnostic tasks across four task levels for five foundation models (GPT-5, Gemini-2.5-Pro, Grok-4, Claude-Sonnet-4, and Llama-4-Maverick). b, Distributions of debug rounds across all task levels for the same models, with colors corresponding to those in (a). c, d, Mean time cost (c) and API cost (d) across task levels. e, Comparison of debug rounds and time cost distributions between the first and second evolution rounds (using GPT-5 as the foundation model). Tasks at level 4 are semi-open problems and are therefore excluded from the evolution experiments.