This repository serves as a companion to the eBook:
-
Fork this repository
-
Clone the forked repository to your local machine
-
Make sure you have the Astro CLI installed and are at least on version 1.34.0 to be able to run this Airflow 3 based project.
-
Create a copy of the
.env_examplefile and name it.env, this file contains environment variables. If you'd like to run dags using other backends then Postgres or Spark, you will need to updated the connections in this file with your values, for example to connect to a Snowflake instance. -
In the root of the project, run astro dev start to start the project locally. This command will spin up 8 containers on your machine, using Docker or Podman. 5 containers that run Airflow:
- Postgres: Airflow's Metadata Database
- API Server: The Airflow component responsible for rendering the Airflow UI and serving 3 APIs, one of which is needed for task code to interact with the Airflow metadata database.
- Scheduler: The Airflow component responsible for monitoring and triggering tasks
- Dag processor: The Airflow component responsible for parsing dags.
- Triggerer: The Airflow component responsible for triggering deferred tasks
As well as 3 additional containers defined in the docker-compose.override.yml file to test the dags against:
- Spark Master and Worker: Spark containers to run the dags
example_DbtDag_sparkandexample_DbtTaskGroup_sparkdags. - Postgres: A postgres database that most of the dag example can be tested with locally.
The connections to the Spark and Postgres containers are listed in the
.env_examplefile and will be automatically set up if you created a.envfile with the same content. -
You can now access the Airflow UI at
http://localhost:8080and run the dags. All dags that are tagged without-of-the-boxare ready to run without further setup.
This repository contains 26 dags.
-
advanced_examples: 8 dags that showcase more advanced features of Cosmos.
- example_granular_task_dependencies_DbtDag: This dag shows how to set dependencies between tasks outside of the dbt project and individual tasks inside the dbt project rendered with Cosmos when using
DbtDag. - example_granular_task_dependencies_DbtTaskGroup: This dag shows how to set dependencies between tasks outside of the dbt project and individual tasks inside the dbt project rendered with Cosmos when using
DbtTaskGroup. - example_inject_dbt_vars: This dag shows how to inject dbt vars at runtime into the dbt project.
- example_reduce_granularity: This dag shows how to reduce the granularity of the dbt project by running parts of it with the
DbtBuildLocalOperator. - example_use_profiles_yml: This dag shows how to use a profiles.yml file instead of a
ProfileMappingand Airflow connection. - example_dbt_docs: This dag shows how to use the dbt docs feature in Airflow 2 (not available in Airflow 3 at the time of writing).
- cosmos_assets: This dag shows how to schedule downstream dags based on asset events automatically generated by Cosmos.
- example_async_dbt_project: This dag shows how to use the
asyncexecution mode for a Cosmos dbt project with BigQuery.
- example_granular_task_dependencies_DbtDag: This dag shows how to set dependencies between tasks outside of the dbt project and individual tasks inside the dbt project rendered with Cosmos when using
-
basic_examples_per_dwh: 12 dags that showcase the basic use of Cosmos with different data warehouses.
- example_DbtDag_duckdb: This dag shows how to use Cosmos with DuckDB.
- example_DbtDag_postgres: This dag shows how to use Cosmos with Postgres.
- example_DbtDag_spark: This dag shows how to use Cosmos with Spark.
- example_DbtDag_snowflake: This dag shows how to use Cosmos with Snowflake.
- example_DbtDag_databricks: This dag shows how to use Cosmos with Databricks.
- example_DbtDag_bigquery: This dag shows how to use Cosmos with BigQuery.
- example_DbtDag_databricks: This dag shows how to use Cosmos with Databricks.
- example_DbtTaskGroup_duckdb: This dag shows how to use Cosmos with DuckDB.
- example_DbtTaskGroup_postgres: This dag shows how to use Cosmos with Postgres.
- example_DbtTaskGroup_spark: This dag shows how to use Cosmos with Spark.
- example_DbtTaskGroup_snowflake: This dag shows how to use Cosmos with Snowflake.
- example_DbtTaskGroup_databricks: This dag shows how to use Cosmos with Databricks.
- example_DbtTaskGroup_bigquery: This dag shows how to use Cosmos with BigQuery.
- example_DbtTaskGroup_databricks: This dag shows how to use Cosmos with Databricks.
-
complex_examples: 2 dags that showcases a more complex example.
- customer_360: This dag shows how to use Cosmos to build a customer 360 view. This dag uses several performance optimizations such as:
- Using the dbt_manifest load mode.
- Pre-computing dbt deps
- Rendering models and their respective tests as a singular node with TestBehavior.BUILD.
- customer_360_snowflake: This dag shows how to use Cosmos to build a customer 360 view with Snowflake. This dag does not use performance optimizations.
- customer_360: This dag shows how to use Cosmos to build a customer 360 view. This dag uses several performance optimizations such as:
-
cosmos_and_dbt_fusion: 1 dag that showcases the integration of Cosmos and dbt Fusion.
- example_dbt_fusion: This dag shows how to use Cosmos and dbt Fusion together. This dag uses the
DbtDagclass to render the dbt project.
- example_dbt_fusion: This dag shows how to use Cosmos and dbt Fusion together. This dag uses the
-
other_examples: 2 dags that showcase other examples of how to use Cosmos.
All dbt projects are located in the include/dbt folder.