This repository provides a minimal template for research projects in Python. It includes a basic structure for organizing code, results, and documentation, and demonstrates how to incorporate good practices in software engineering such as testing, continuous integration, and automated linting and formatting. The template is designed to be lightweight such that it can be easily adapted to your specific needs.
Motivation: Research projects often grow organically: Experiments accumulate, code gets copied across notebooks, and results become difficult to trace. A small amount of structure helps prevent this and makes it easier to understand, reproduce, and maintain your code, or share it with collaborators.
Credits & License: This template was created by Lukas Tatzel and is available at https://github.com/ltatzel/Research-Project-Template. If you find it useful, feel free to reference the original repository. In case you have any questions or suggestions for improvement, please open an issue on GitHub.
This template implements a minimal set of practices that help keep research code organized and reliable.
-
Version Control: Version control allows you to track the history of a project. Every change to the codebase is recorded and can be inspected or reverted later. It also facilitates collaboration, as multiple people can work on in different branches and merge their changes together. This repository is a GitHub template, which means that when you create a new repository based on this template, it will already be set up with git for version control.
-
Testing: Automated tests help ensure that your code behaves as expected. For example, if you know the expected output of a function for certain inputs, you can write a test that verifies this behavior automatically. This template uses the pytest package for running tests.
-
Continuous Integration: Whenever code changes are made, it is useful to automatically check whether everything still works. This process is called Continuous Integration (CI). This template includes a GitHub workflow in
.github/workflowsthat automatically creates a fresh environment on GitHub, installs the project dependencies, and runs the tests usingpytest. If something fails, the workflow reports the error. CI also ensures that all required dependencies are explicitly listed in theproject_template_env.ymlfile, which is important for reproducibility. -
Code Linting and Formatting: Linting checks your code for errors, bad practices, and style violations (e.g. unused imports or undefined variables). A formatter automatically rewrites your code to follow consistent style rules (e.g. a specific maximum line length). This template uses Ruff which is a fast Python linter and formatter that supports both Python files and Jupyter notebooks.
-
Clean Code: Writing clean and readable code is always advisable. Some useful principles include:
- Use meaningful names for variables, functions, etc.
- Avoid duplicating code. If you find yourself copying and pasting code, the alarm bells should be ringing in your head. It will often make sense to extract the common code into a reusable function or module.
- Prefer simple solutions over complex ones.
- Each code unit should only perform a single, clearly defined task (this is also known as the Single Responsibility Principle).
- Write docstrings describing what your code does. Here's an example with a
simple one-line docstring:
There are different conventions for docstrings that you can adopt, e.g. the Google or NumPy style. But even a short description as in the example above is better than no docstring at all.
def add_10(input_number): """Add 10 to the given number `input_number`.""" return input_number + 10
These small habits make your code easier to understand for collaborators and your future self.
The repository is organized into four main folders.
-
source: Thesourcefolder contains reusable code that may be used across multiple experiments. Since we want to be able to import this code, it should be organized in Python files (not Jupyter notebooks). Also, since this code is meant to be reused, it should include basic documentation and tests. All files live insource/our_library. This folder can be installed as a local Python package viapip install -e .(details in section 3). After installation, functions can be imported in experiments very conveniently:from our_library.some_functions import add_10.The
sourcefolder also contains atestsdirectory for automated testing. A useful convention is to mirror the structure of the library and the tests. For instance, if you have a filesource/our_library/some_functions.py, you should implement the corresponding tests insource/tests/test_some_functions.py. -
experiments: Theexperimentsfolder contains the actual research experiments. Each experiment has its own numbered folder, e.g.experiments/01_first_experiment. Experiments can be implemented using Jupyter notebooks or Python scripts. Separating reusable code (source) from experiment-specific code (experiments) helps keep projects organized. A good rule of thumb: If you find yourself copying code between experiments, it should probably be moved tosource. -
results: Theresultsfolder stores outputs generated by our experiments. Its structure mirrors theexperimentsfolder. So, if you have an experiment inexperiments/01_first_experiment, the corresponding results should be stored inresults/01_first_experiment. This makes it easy to keep track of which results belong to which experiment.Typical outputs include plots, tables, trained models, or intermediate data. Because result files can be large, the contents of the
resultsfolder are excluded from git tracking by default. However, if your files are small, you can explicitly track a subfolder by adding an exception such as!results/01_first_experiment/to the.gitignorefile. An alternative for large files is to use Git LFS. -
documentation: Thedocumentationfolder contains project documentation. Currently it includes a minimal LaTeX paper template indocumentation/paper. Additional material such as slides or reports can also be placed here. Since everything lives in the same repository, figures from theresultsfolder can easily be included in the paper.If you prefer writing papers in an online LaTeX editor, platforms such as Overleaf can also be connected to GitHub repositories. This allows you to edit the paper collaboratively in Overleaf while keeping the source files synchronized with the repository. More information on this can be found here.
Prerequisites: To use this template, you need a GitHub account and git
installed on your local machine. In addition, you need conda installed to
create and manage the Python environment used in this project. If you want to
compile the LaTeX paper, you will also need a LaTeX distribution such as TeX
Live installed on your machine.
If this is the first time you are using this template, it is recommended to work through all of the following steps:
-
Use the Template and Clone: On the GitHub page of this repository, click on "Use this template" to create a new repository based on this template. Give the new repository a name that makes sense for your project and set it to private if you do not want to share it publicly. You can also add a project description here. After creating the new repository, use
git cloneto download it to your local machine. -
Setup the Environment: First create and activate the environment.
OPTIONAL: If you want, you can already modify the template here to give the environment a meaningful name. For this, change the name of the
.ymlfile, thenamefield in that file and the GitHub workflow in.github/workflowsaccordingly.conda env create -f project_template_env.yml conda activate project_template_envThis will install the required external dependencies listed in
project_template_env.yml. Next, install the local library.OPTIONAL: Again, you can modify the template here and rename the
our_libraryfolder. Note, however, that this requires additional changes (e.g. in thesetup.pyfile and import statements).cd source pip install -e .For this to work, we need the
setup.pyfile insource. Take a look at that file. We also need the__init__.pyfile insource/our_library. This ensures that theour_libraryfolder is recognized as a Python package. The-eflag installs the package in editable mode, meaning that changes to the source code immediately become available without reinstalling the package.You can verify the installation with
conda list. Make sure that the local packageour_libraryappears in the list of installed packages. -
Explore the Repository: Explore the structure of the repository. Note that the structure of the
testsfolder mirrors the structure of theour_libraryfolder; andresultsmirrorsexperiments. This is a useful convention that helps keep things organized. -
Run the Tests and Ruff: Take a look at
source/our_library/some_functions.pyand the corresponding tests insource/tests/test_some_functions.py. Note that the tests use@pytest.mark.parametrize, which is a convenient way to run the same test with multiple inputs. Runpytest .from thesourcedirectory to see the automated tests in action. Feel free to extendour_libraryand add additional tests.sIn addition, you can manually run the linter and formatter via
ruff check . ruff format . -
Trigger the GitHub Workflow: Take a look at the GitHub workflow
run_pytest.ymlin.github/workflows. This workflow is triggered automatically whenever you push changes to GitHub (see the section beginning withon). It creates a fresh environment, installs the dependencies, checks the code using Ruff, and runs the tests usingpytest. You can make a small change to the code (e.g. by adding another test case totest_some_functions.py) and push it to GitHub. Then, open your browser, go to your repository and click on the Actions tab to inspect the workflow. -
Enable Automatic Checks Before Commits: You can run formatting and linting automatically before each commit using the
pre-commitpackage. This helps catch issues early and keeps the codebase clean without any extra effort. Runpre-commit install, which uses the configuration specified in the.pre-commit-config.yamlfile. After this, every time you create a commit (e.g. via the terminal or an editor like VS Code), the code will be automatically checked and formatted using Ruff. If issues are found, the commit will be blocked until they are resolved. You can also run the pre-commit checks manually viapre-commit run --all-files. -
Run the Experiments: Take a look at the experiments in the
experimentsfolder. Note how reusable code fromsource/our_libraryis imported and how results are written to theresultsfolder. Run the first experiment to see this in action. Feel free to add an additional experiment. For this, you could add code tosource/our_library, write a corresponding test, import the new function in your experiment and finally generate and save some results. -
Compile the LaTeX Paper: Try compiling the LaTeX paper in
documentation/paper. The template demonstrates how figures from theresultsfolder can be included in the paper. -
Adapt and Extend the Template: Adapt and extend the template as needed for your own research project.