GitHub - ericatmdsol/SampleProject: Sample Project Skeleton

Basic Sample Structure

For what it's worth, this is my preferred structure for a python/R project (provided code is in python, but similar concepts exist in R)

The core principles are:

In projects, updates (not code runs) generally happen in discrete buckets:

Data Preprocessing
Modeling
Output

Because of this we don't want to mix the code. This allows us to revert individual files to a certain commit, rather than figuring out what has to be reverted in each individual file. It also makes testing your code in discrete units a bit simpler. NOTE: OS X -> Linux has an issue with this only because when we use explicit, it gets the specific package and that's architecture specific.

I prefer using the spec-file vs. the requirements file because in DS projects, it's easier to create a conda env from it:

conda list --explicit > spec-file.txt            # creates the export
conda create --name new_env --file spec-file.txt # creates the env

Make sure you run the following command in your project:

pip install -e .                     #run this under your code directory

Code Concepts

Hydra

I like using hydra-core to maintain run configurations. It has a default set of configurations that you maintain through a yaml file. Each parameter can then be overridden through commandline prompts. This also provides information to engineering what bits of your configuration might be user configuarable

Use of TransformersMixin/RegressorMixin

It pays to break down each step of data processing into a specific class. This allows each seperate step to

be debugged independently,
tested independently,
if clients are indecisive, makes it easy revert individual components of the pipeline back with a git command rather than having to modify code (with tests this is less error prone as well)
If you want to ship a model, you can pickle the single pipeline object, and with transforms that require fitting, that comes included without needing to do anything special
This makes it easy for the engineers to wrap whatever you did into an application

Test Folders

The test folder should be an exact copy of your src folder, except that rather than the code itself, you're writing tests. One of the most useful things I find about having tests is that during a code review, if I think that someone isn't covering all edge cases, I can easily plop that edge case into the tests, and see whether it passes or not.

Link to the presentation that covers more of this:

https://docs.google.com/presentation/u/0/d/1IbreB_NY5XKnrzW7BeFLQeQhlyvZh0rK/edit?usp=slides_home&ths=true&rtpof=true

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.vscode		.vscode
config		config
src		src
tests		tests
.gitignore		.gitignore
README.md		README.md
setup.py		setup.py
spec-file.txt		spec-file.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Basic Sample Structure

Code Concepts

Hydra

Use of TransformersMixin/RegressorMixin

Test Folders

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Basic Sample Structure

Code Concepts

Hydra

Use of TransformersMixin/RegressorMixin

Test Folders

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages