Feat: Allow specifying a minimum number of intervals to include for each model in a plan by erindru · Pull Request #4780 · SQLMesh/sqlmesh

erindru · 2025-06-22T23:01:23Z

This addresses part of issue #4069 , albeit in a slightly different way to what is described in the ticket.

This PR adds a new plan option, --min-intervals, intended to be used like so:

sqlmesh plan dev --start '2 weeks ago' --min-intervals 1

What this option does is ensure that all models in the new environment have at least 1 interval backfilled, even if their interval unit is larger than the relative time period specified for --start.

It does this by allowing a list of per-model start date overrides to be supplied to a plan (similar to the existing interval_end_per_model argument). If there is a start date override available for a given snapshot, it gets used, otherwise the plan start date gets used.

Thus, --min-intervals is implemented in terms of calculating the earliest start date that would be needed to cover --min-intervals intervals. If this calculated date is earlier than the plan start date, it is added to the start date overrides.

The start date overrides are used by:

DeployabilityIndex.create() to ensure that the adjusted per-model start date still results in deployable data
missing_intervals() to override the start date that is given to Snapshot.missing_intervals() to return intervals that can be outside the default plan bounds

The immediate use-case is for PR environments created by the CI/CD bot which would allow you to say things like:

"always create PR envs with 2 weeks worth of data in them" and still have this include monthly models
"always create PR envs with 1 days worth of data in them" and still have this include weekly and monthly models

Right now these are excluded which can result in downstream daily models missing data in PR envs.

This option could be extended in future to also specify the minimum number of intervals to cover for dev previews

izeigerman · 2025-06-24T20:48:01Z

+@click.option(
+    "--min-intervals",
+    default=0,
+    help="In new environments created against a specific time range, ensure that models contain at least this many intervals",


I don't think this only impacts new environments, does it?

Yes, that was meant to say dev environments (there was a check that threw an error if you specified this on prod).

I've updated the text and removed the dev environment check because specifying this on prod is harmless in the sense that it doesnt do anything because prod doesnt support --start and --end so already considers the full time range

izeigerman · 2025-06-30T17:26:32Z

+            if not snapshot:
+                continue
+
+            starting_point = plan_end_dt


Shouldn't we use interval_end_per_model here instead of a global end?

Since it's an explicit goal not to backfill anything beyond what exists in prod, yes.

I've updated this

izeigerman · 2025-06-30T17:28:04Z

            ),
            end_bounded=not run,
            ensure_finalized_snapshots=self.config.plan.use_finalized_state,
+            start_override_per_model=start_override_per_model,


Should the name be consistent with interval_end_per_model? I don't care which one is it, but I feel like they represent similar thing and should be named similarly.

I've renamed interval_end_per_model to end_override_per_model on the Plan to match

izeigerman · 2025-06-30T17:29:49Z

-        snapshot_start_date = start_dt
+
+        snapshot_start_override = start_override_per_model.get(snapshot.name, None)
+        snapshot_start_date = snapshot_start_override or start_dt


snapshot_start_date = start_override_per_model.get(snapshot.name, start_dt)

?

Good point, updated

izeigerman · 2025-07-03T20:29:06Z

+        # for example, A(hourly) <- B(daily)
+        # if min_intervals=1, A would have 1 hour and B would have 1 day
+        # but B depends on A so in order for B to have 1 valid day, A needs to be expanded to 24 hours
+        backfill_dag = self.dag.prune(*backfill_model_fqns)


Are you sure we can reuse this DAG? Wouldn't the loaded DAG be different if they use a selector?

I thought it was fine due to the pruning, but i've updated the code to construct a new DAG

izeigerman · 2025-07-03T20:35:16Z

+        ]
+
+        # start from the leaf nodes and work back towards the root because the min_start at the root node is determined by the calculated starts in the leaf nodes
+        for subdag in reversed_subdags:


Wouldn't this contain overlapping subdags? Why can't we reverse the whole DAG and just traverse it in one go? So something like:

reversed_dag = dag.reversed for model_fqn in reversed_dag: snapshot = snapshots_by_model_fqn[model_fqn] # Get the minimum start from all immediate children of this snapshot min_child_start = min([ start_overrides.get(fqn, sys.max) for fqn in reversed_dag.get(model_fqn, set()) ]) # Proceed with computing the start for this snapshot and taking a min of computed start and min_child_start ```

Yep, the key difference here is only checking immediate children which I missed when scanning the DAG API originally (I thought it had to be all downstream nodes).

I've adjusted it as per your suggestion and also set an override whether it's needed or not so there is always a value for each node in the start_overrides dict

…ach model in a plan

…ld start date override

izeigerman · 2025-07-04T02:18:39Z

            return None, None

-        default_end = max(max_interval_end_per_model.values())
+        default_end = to_timestamp(max(max_interval_end_per_model.values()))


Just curious, why did we change this to datetime only to convert back to timestamp later?

Rome wasnt built in a day and the rest of the code in that method was ints.

One day I hope we will use proper types internally and push TimeLike and co back to the edges / user input handling only

erindru force-pushed the erin/pr-min-intervals branch 3 times, most recently from 8db1a48 to 237c765 Compare June 23, 2025 22:33

erindru changed the title ~~Feat: Allow specifying a minimum number of intervals to include for dev plans with a relative start date~~ Feat: Allow specifying a minimum number of intervals to include for dev plans Jun 23, 2025

izeigerman reviewed Jun 24, 2025

View reviewed changes

Comment thread sqlmesh/core/snapshot/definition.py Outdated

izeigerman reviewed Jun 24, 2025

View reviewed changes

Comment thread sqlmesh/core/snapshot/definition.py Outdated

erindru force-pushed the erin/pr-min-intervals branch from 237c765 to 718ea0f Compare June 27, 2025 01:46

erindru changed the title ~~Feat: Allow specifying a minimum number of intervals to include for dev plans~~ Feat: Allow specifying a minimum number of intervals to check during missing_intervals() Jun 27, 2025

erindru force-pushed the erin/pr-min-intervals branch from 718ea0f to 0ed5945 Compare June 30, 2025 04:31

erindru changed the title ~~Feat: Allow specifying a minimum number of intervals to check during missing_intervals()~~ Feat: Allow specifying a minimum number of intervals to include for each model in a plan Jun 30, 2025

erindru force-pushed the erin/pr-min-intervals branch from 0ed5945 to 3a1a7c3 Compare June 30, 2025 04:39

izeigerman reviewed Jun 30, 2025

View reviewed changes

Comment thread sqlmesh/core/snapshot/definition.py Outdated

erindru force-pushed the erin/pr-min-intervals branch 4 times, most recently from 02bbb5f to 9c5dfc2 Compare July 3, 2025 04:07

izeigerman reviewed Jul 3, 2025

View reviewed changes

erindru added 2 commits July 3, 2025 21:59

Feat: Allow specifying a minimum number of intervals to include for e…

1fb7b50

…ach model in a plan

Widen parent start date override if it ends up being later than a chi…

1e56590

…ld start date override

erindru force-pushed the erin/pr-min-intervals branch from 9c5dfc2 to 1e56590 Compare July 3, 2025 21:59

izeigerman reviewed Jul 4, 2025

View reviewed changes

izeigerman approved these changes Jul 4, 2025

View reviewed changes

erindru merged commit 61455f2 into main Jul 4, 2025
27 checks passed

erindru deleted the erin/pr-min-intervals branch July 4, 2025 02:31

erindru mentioned this pull request Jul 4, 2025

Feat(cicd_bot): Document and enable the min_intervals plan option #4901

Merged

Conversation

erindru commented Jun 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

erindru Jul 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

izeigerman Jul 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

erindru Jul 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

erindru commented Jun 22, 2025 •

edited

Loading

erindru Jul 1, 2025 •

edited

Loading

izeigerman Jul 3, 2025 •

edited

Loading

erindru Jul 3, 2025 •

edited

Loading