"Civilization advances by extending the number of important operations which we can perform without thinking about them." — Alfred North Whitehead
Use AI to improve research efficiency and expand the space of exploration.
From focused tools to agents that participate in and reshape the research pipeline.
This repository focuses on recent, high-signal work on AI-driven research itself: systems that participate in the research loop, optimize it, or redefine who drives the loop.
Start Here · By Research Stage · By Role · By Section · By Domain · Benchmarks · Taxonomy · Contributing
-
If you care about AI for Research systems
Start with End-to-End Research Systems and Benchmarks & Evaluation to compare what representative AI-for-research systems actually cover and how they are evaluated. -
If you care about AI for Research in a vertical domain
Start with By Domain, then move to the relevant section pages and Benchmarks & Evaluation for the systems and evaluation anchors that matter in that discipline. -
If you care about self-evolving systems
Start with Self-Evolving Systems, then use Experimentation & Agent Methods to compare explicit self-improvement loops, experiment-improve systems, and reusable agent methods.
Representative papers and systems that give a fast first read on the current AI4Research landscape.
- Autoresearch (2026)
Experimentation & Agent MethodsL3A self-improving AI research repo where agents iteratively rewrite a small training stack, run short experiments, and keep better variants. Repo - Learning to Discover at Test Time (2026)
Experimentation & Agent MethodsL3A test-time training system that uses reinforcement learning on a single target problem so the model can keep improving while searching for stronger scientific and algorithmic solutions. Paper · Project - AlphaEvolve (2025)
Experimentation & Agent MethodsL3A Gemini-powered evolutionary coding agent for discovering better algorithms and scientific solutions through repeated proposal, execution, and selection. Paper · Blog - AI co-scientist (2025)
Research IdeationL2A Gemini-based multi-agent collaborator for generating literature-grounded hypotheses, overviews, and experimental protocols. Paper · Blog - Can LLMs Generate Novel Research Ideas? (2024)
Research IdeationL1A large-scale blind human study comparing LLM-generated and expert-generated NLP ideas, with LLM ideas rated higher on novelty but slightly lower on feasibility. Paper · Project - The AI Scientist (2024)
End-to-End Research SystemsL3A full-loop AI scientist for idea generation, code writing, experimentation, paper drafting, and simulated review. Paper · Repo - AlphaFold 3 (2024)
Experimentation & Agent MethodsL1A biomolecular structure and interaction prediction system that extends AlphaFold from protein folding to complexes involving proteins, nucleic acids, ligands, and ions. Paper · Repo · Project
Use this view when you want to find systems by the stage of research they most clearly serve.
Discover & Synthesize · 14 · Ideate · 19 · Plan & Design · 4 · Implement · 14
Execute & Experiment · 31 · Analyze & Visualize · 9 · Write & Review · 7
This ladder separates human-driven tools, human-in-the-loop collaborators, and AI-driven systems.
L1 Tools · 19- Human drives the loop. The AI acts as a tool, local component, or narrow assistant rather than as a workflow owner.L2 Collaborators · 27- Human in the loop. The AI can advance multi-step work, but humans still steer, gate, or validate important decisions.L3 Systems · 37- AI drives the loop. The system owns substantial execution and iteration, while humans mainly provide goals, constraints, or downstream review.
| Section | Focus | Count |
|---|---|---|
| 🔬 End-to-End Research Systems | Systems that cover multiple core stages of the research process and aim to complete a relatively full research loop with limited human intervention. | 14 |
| ⚙️ Experimentation & Agent Methods | Systems, methods, and specialized agents whose main contribution is iterative experimentation, optimization, search, reflection, or self-improving execution, rather than a full end-to-end research workflow. | 40 |
| 💡 Research Ideation | Systems focused on generating research questions, hypotheses, directions, or project ideas. | 14 |
| 📚 Literature Discovery & Synthesis | Systems focused on finding, organizing, comparing, and synthesizing prior work. | 5 |
| 📝 Survey / Review Automation | Systems designed to produce structured surveys, reviews, or systematic review-style outputs. | 4 |
| 🧱 Research Infrastructure & Frameworks | Frameworks, platforms, runtimes, and engineering environments for building and operating research agents. | 6 |
| 📏 Benchmarks & Evaluation | Benchmarks, datasets, metrics, and evaluation frameworks for research agents and AI-for-research systems. | 16 |
Use this view when you want to find systems through the disciplines where they are being applied, rather than through research stage or role in the loop.
| Domain | Focus | Count |
|---|---|---|
| Artificial Intelligence | Applications in artificial intelligence research, machine learning research workflows, and AI-for-AI systems with explicit AI research targets or evaluation. | 45 |
| Biomedical | Applications in biology, medicine, drug discovery, and biomedical literature. | 4 |
| Chemistry | Applications in chemical reasoning, synthesis, and molecular discovery. | 2 |
| Computer Science | Applications in non-AI computer science research such as formal methods, programming systems, and algorithmic reasoning. | 4 |
| General | Cross-domain or discipline-agnostic systems for AI-driven research workflows, literature work, agent infrastructure, and research methodology that are not clearly anchored to one vertical field. | 22 |
| Materials Science | Applications in materials discovery, materials property reasoning, and experimental design. | 2 |
| Math | Applications in mathematical reasoning, theorem proving, and formal proof discovery. | 3 |
| Physics | Applications in physical science reasoning, modeling, and scientific analysis. | 2 |
| Social Science | Applications in social simulation, policy analysis, behavioral science, and computational social experiments. | 2 |
Benchmarks are a core surface of this repository rather than an appendix. They make it easier to separate promising demos from systems that are tested on realistic scientific or AI-research workloads.
The main benchmark hub lives at docs/benchmarks/index.md.
- FrontierScience (2025) ·
Scientific ReasoningAn expert-written benchmark for Olympiad-style and research-style scientific reasoning across physics, chemistry, and biology. - Scientist-Bench (2025) ·
System Benchmark·Ideation & DiscoveryA benchmark surface introduced alongside AI-Researcher for assessing guided and open-ended autonomous AI research. - MLE-Bench (2024) ·
System BenchmarkA benchmark built from Kaggle competitions to measure how well AI agents perform at machine learning engineering. - ScienceAgentBench (2024) ·
System Benchmark·Scientific ReasoningA task-level benchmark for evaluating language agents on authentic data-driven scientific discovery problems.
System Benchmark- Measures end-to-end AI research systems or realistic multi-step workflows.Scientific Reasoning- Measures scientific problem solving, expert analysis, or domain reasoning quality.Ideation & Discovery- Measures hypothesis generation, novelty, or open-ended scientific discovery quality.
The repository uses a lightweight taxonomy so entries can be read through research stage, role in the loop, application domain, and evidence quality without turning the README into a flat list.
- Research stage taxonomy for the macro stage map and fine-grained stage legend.
- Role taxonomy for the L1 to L3 human-tool-system ladder, with
self-evolvingtracked as a tag. - Application domain taxonomy for discipline-oriented navigation across artificial intelligence, biomedical, chemistry, computer science, general, materials science, math, physics, and social science.
- Benchmark taxonomy for the compressed evaluation vocabulary used across benchmark pages.
- Evidence taxonomy for how we interpret evidence strength across papers, reports, benchmarks, and repositories.
If you find this repository useful in your research, please cite:
@misc{awesome_ai_for_research_2026,
author = {Jing, Yi and Xin, Amy and Yao, Zijun},
title = {Awesome AI for Research},
year = {2026},
howpublished = {\url{https://github.com/THU-KEG/Awesome-AI-for-Research}},
note = {GitHub repository}
}This repository is curated, and contributions are welcome when they improve the source data.
The most helpful contributions are:
- adding a new entry
- correcting links or metadata for an existing entry
Please keep the scope narrow: recent, high-signal AI4Research systems over historical completeness.
When contributing:
- prefer primary sources for papers, repositories, benchmark pages, and project sites
- edit the source data rather than generated Markdown pages
- leave featured selections, taxonomy files, and templates unchanged unless a broader change is clearly necessary
- regenerate the repository with
python3 tooling/build.pybefore submitting a pull request