Awesome AI for Research

"Civilization advances by extending the number of important operations which we can perform without thinking about them." — Alfred North Whitehead

Use AI to improve research efficiency and expand the space of exploration.
From focused tools to agents that participate in and reshape the research pipeline.

This repository focuses on recent, high-signal work on AI-driven research itself: systems that participate in the research loop, optimize it, or redefine who drives the loop.

Start Here · By Research Stage · By Role · By Section · By Domain · Benchmarks · Taxonomy · Contributing

🚀 Start Here

If you care about AI for Research systems
Start with End-to-End Research Systems and Benchmarks & Evaluation to compare what representative AI-for-research systems actually cover and how they are evaluated.
If you care about AI for Research in a vertical domain
Start with By Domain, then move to the relevant section pages and Benchmarks & Evaluation for the systems and evaluation anchors that matter in that discipline.
If you care about self-evolving systems
Start with Self-Evolving Systems, then use Experimentation & Agent Methods to compare explicit self-improvement loops, experiment-improve systems, and reusable agent methods.

🌟 Featured Works

Representative papers and systems that give a fast first read on the current AI4Research landscape.

Autoresearch (2026) Experimentation & Agent Methods L3 A self-improving AI research repo where agents iteratively rewrite a small training stack, run short experiments, and keep better variants. Repo
Learning to Discover at Test Time (2026) Experimentation & Agent Methods L3 A test-time training system that uses reinforcement learning on a single target problem so the model can keep improving while searching for stronger scientific and algorithmic solutions. Paper · Project
AlphaEvolve (2025) Experimentation & Agent Methods L3 A Gemini-powered evolutionary coding agent for discovering better algorithms and scientific solutions through repeated proposal, execution, and selection. Paper · Blog
AI co-scientist (2025) Research Ideation L2 A Gemini-based multi-agent collaborator for generating literature-grounded hypotheses, overviews, and experimental protocols. Paper · Blog
Can LLMs Generate Novel Research Ideas? (2024) Research Ideation L1 A large-scale blind human study comparing LLM-generated and expert-generated NLP ideas, with LLM ideas rated higher on novelty but slightly lower on feasibility. Paper · Project
The AI Scientist (2024) End-to-End Research Systems L3 A full-loop AI scientist for idea generation, code writing, experimentation, paper drafting, and simulated review. Paper · Repo
AlphaFold 3 (2024) Experimentation & Agent Methods L1 A biomolecular structure and interaction prediction system that extends AlphaFold from protein folding to complexes involving proteins, nucleic acids, ligands, and ions. Paper · Repo · Project

🧭 Browse by Research Stage

Use this view when you want to find systems by the stage of research they most clearly serve.

Discover & Synthesize · 14 · Ideate · 19 · Plan & Design · 4 · Implement · 14
Execute & Experiment · 31 · Analyze & Visualize · 9 · Write & Review · 7

🪜 Browse by Role

This ladder separates human-driven tools, human-in-the-loop collaborators, and AI-driven systems.

L1 Tools · 19 - Human drives the loop. The AI acts as a tool, local component, or narrow assistant rather than as a workflow owner.
L2 Collaborators · 27 - Human in the loop. The AI can advance multi-step work, but humans still steer, gate, or validate important decisions.
L3 Systems · 37 - AI drives the loop. The system owns substantial execution and iteration, while humans mainly provide goals, constraints, or downstream review.

📚 Browse by Section

Section	Focus	Count
🔬 End-to-End Research Systems	Systems that cover multiple core stages of the research process and aim to complete a relatively full research loop with limited human intervention.	14
⚙️ Experimentation & Agent Methods	Systems, methods, and specialized agents whose main contribution is iterative experimentation, optimization, search, reflection, or self-improving execution, rather than a full end-to-end research workflow.	40
💡 Research Ideation	Systems focused on generating research questions, hypotheses, directions, or project ideas.	14
📚 Literature Discovery & Synthesis	Systems focused on finding, organizing, comparing, and synthesizing prior work.	5
📝 Survey / Review Automation	Systems designed to produce structured surveys, reviews, or systematic review-style outputs.	4
🧱 Research Infrastructure & Frameworks	Frameworks, platforms, runtimes, and engineering environments for building and operating research agents.	6
📏 Benchmarks & Evaluation	Benchmarks, datasets, metrics, and evaluation frameworks for research agents and AI-for-research systems.	16

🧪 Browse by Application Domain

Use this view when you want to find systems through the disciplines where they are being applied, rather than through research stage or role in the loop.

Domain	Focus	Count
Artificial Intelligence	Applications in artificial intelligence research, machine learning research workflows, and AI-for-AI systems with explicit AI research targets or evaluation.	45
Biomedical	Applications in biology, medicine, drug discovery, and biomedical literature.	4
Chemistry	Applications in chemical reasoning, synthesis, and molecular discovery.	2
Computer Science	Applications in non-AI computer science research such as formal methods, programming systems, and algorithmic reasoning.	4
General	Cross-domain or discipline-agnostic systems for AI-driven research workflows, literature work, agent infrastructure, and research methodology that are not clearly anchored to one vertical field.	22
Materials Science	Applications in materials discovery, materials property reasoning, and experimental design.	2
Math	Applications in mathematical reasoning, theorem proving, and formal proof discovery.	3
Physics	Applications in physical science reasoning, modeling, and scientific analysis.	2
Social Science	Applications in social simulation, policy analysis, behavioral science, and computational social experiments.	2

📏 Benchmarks & Evaluation

Benchmarks are a core surface of this repository rather than an appendix. They make it easier to separate promising demos from systems that are tested on realistic scientific or AI-research workloads.

The main benchmark hub lives at docs/benchmarks/index.md.

Highlighted benchmark anchors

FrontierScience (2025) · Scientific Reasoning An expert-written benchmark for Olympiad-style and research-style scientific reasoning across physics, chemistry, and biology.
Scientist-Bench (2025) · System Benchmark · Ideation & Discovery A benchmark surface introduced alongside AI-Researcher for assessing guided and open-ended autonomous AI research.
MLE-Bench (2024) · System Benchmark A benchmark built from Kaggle competitions to measure how well AI agents perform at machine learning engineering.
ScienceAgentBench (2024) · System Benchmark · Scientific Reasoning A task-level benchmark for evaluating language agents on authentic data-driven scientific discovery problems.

Benchmark types

System Benchmark - Measures end-to-end AI research systems or realistic multi-step workflows.
Scientific Reasoning - Measures scientific problem solving, expert analysis, or domain reasoning quality.
Ideation & Discovery - Measures hypothesis generation, novelty, or open-ended scientific discovery quality.

🗂 Taxonomy

The repository uses a lightweight taxonomy so entries can be read through research stage, role in the loop, application domain, and evidence quality without turning the README into a flat list.

Research stage taxonomy for the macro stage map and fine-grained stage legend.
Role taxonomy for the L1 to L3 human-tool-system ladder, with self-evolving tracked as a tag.
Application domain taxonomy for discipline-oriented navigation across artificial intelligence, biomedical, chemistry, computer science, general, materials science, math, physics, and social science.
Benchmark taxonomy for the compressed evaluation vocabulary used across benchmark pages.
Evidence taxonomy for how we interpret evidence strength across papers, reports, benchmarks, and repositories.

Citation

If you find this repository useful in your research, please cite:

@misc{awesome_ai_for_research_2026,
  author       = {Jing, Yi and Xin, Amy and Yao, Zijun},
  title        = {Awesome AI for Research},
  year         = {2026},
  howpublished = {\url{https://github.com/THU-KEG/Awesome-AI-for-Research}},
  note         = {GitHub repository}
}

🤝 Contributing

This repository is curated, and contributions are welcome when they improve the source data.

The most helpful contributions are:

adding a new entry
correcting links or metadata for an existing entry

Please keep the scope narrow: recent, high-signal AI4Research systems over historical completeness.

When contributing:

prefer primary sources for papers, repositories, benchmark pages, and project sites
edit the source data rather than generated Markdown pages
leave featured selections, taxonomy files, and templates unchanged unless a broader change is clearly necessary
regenerate the repository with python3 tooling/build.py before submitting a pull request

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
.playwright-mcp		.playwright-mcp
docs		docs
tooling		tooling
.gitignore		.gitignore
CITATION.cff		CITATION.cff
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Awesome AI for Research

🚀 Start Here

🌟 Featured Works

🧭 Browse by Research Stage

🪜 Browse by Role

📚 Browse by Section

🧪 Browse by Application Domain

📏 Benchmarks & Evaluation

Highlighted benchmark anchors

Benchmark types

🗂 Taxonomy

Citation

🤝 Contributing

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Awesome AI for Research

🚀 Start Here

🌟 Featured Works

🧭 Browse by Research Stage

🪜 Browse by Role

📚 Browse by Section

🧪 Browse by Application Domain

📏 Benchmarks & Evaluation

Highlighted benchmark anchors

Benchmark types

🗂 Taxonomy

Citation

🤝 Contributing

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages