Skip to content
Merged
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,9 @@ DESCRIPTION >
- `project_insights_copy_ds` contains materialized project insights data.
- Populated by `project_insights_copy.pipe` copy pipe.
- Includes project metadata, health score, first commit, and activity metrics for last 365 days and previous 365 days.
- `id` column is the primary key identifier for the project.
- `id` column is the primary key identifier for the project or repository.
- `type` column indicates the record type: 'project' for project insights or 'repo' for repository insights.
- `repoUrl` column is the full repository URL for repo type records (empty string for project type).
- `name` column is the human-readable project name.
- `slug` column is the URL-friendly identifier used in routing and filtering.
- `logoUrl` column is the URL to the project's logo image.
Expand Down Expand Up @@ -35,6 +37,8 @@ TAGS "Project insights", "Metrics"

SCHEMA >
`id` String,
`type` String,
`repoUrl` String,
`name` String,
`slug` String,
`logoUrl` String,
Expand Down Expand Up @@ -64,4 +68,4 @@ SCHEMA >
`activeOrganizationsPrevious365Days` UInt64

ENGINE MergeTree
ENGINE_SORTING_KEY id
ENGINE_SORTING_KEY type, id
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
DESCRIPTION >
- `repo_health_score_copy_ds` contains comprehensive health score metrics and benchmarks per repository.
- Created via copy pipe with computed health metrics for repository-level analytics.
- Aggregates multiple health dimensions including contributors, popularity, development activity, and security.
- `channel` is the repository URL used as the primary key.
- `activeContributors` is the unique contributor count for the previous quarter.
- `activeContributorsBenchmark` is the benchmark score (0-5) for active contributors.
- `contributorDependencyCount` measures contributor concentration risk (bus factor).
- `contributorDependencyPercentage` is the combined contribution percentage of dependent contributors.
- `contributorDependencyBenchmark` is the benchmark score (0-5) for contributor dependency.
- `organizationDependencyCount` measures organizational concentration risk.
- `organizationDependencyPercentage` is the combined contribution percentage of dependent organizations.
- `organizationDependencyBenchmark` is the benchmark score (0-5) for organization dependency.
- `retentionRate` is the quarter-over-quarter contributor retention percentage.
- `retentionBenchmark` is the benchmark score (0-5) for retention.
- `stars` is the total star count for the repository.
- `starsBenchmark` is the benchmark score (0-5) for stars.
- `forks` is the total fork count for the repository.
- `forksBenchmark` is the benchmark score (0-5) for forks.
- `issueResolution` is the average days to close issues (nullable for repos without issues).
- `issueResolutionBenchmark` is the benchmark score (0-5) for issue resolution.
- `pullRequests` is the PR count in the last 365 days.
- `pullRequestsBenchmark` is the benchmark score (0-5) for pull requests.
- `mergeLeadTime` is the average days to merge PRs (nullable for repos without PRs).
- `mergeLeadTimeBenchmark` is the benchmark score (0-5) for merge lead time.
- `activeDaysCount` is the count of distinct active days in the last 365 days.
- `activeDaysBenchmark` is the benchmark score (0-5) for active days.
- `contributionsOutsideWorkHours` is the percentage of contributions outside work hours.
- `contributionsOutsideWorkHoursBenchmark` is the benchmark score (0-5) for outside work hours.
- `securityPercentage` is the health score percentage for the security category (0-100).
- `contributorPercentage` is the health score percentage for the contributors category (0-100).
- `popularityPercentage` is the health score percentage for the popularity category (0-100).
- `developmentPercentage` is the health score percentage for the development category (0-100).
- `overallScore` is the computed overall health score combining all dimensions.

TAGS "Repository health", "Metrics"

SCHEMA >
`channel` String,
`activeContributors` UInt64,
`activeContributorsBenchmark` UInt64,
`contributorDependencyCount` UInt64,
`contributorDependencyPercentage` Float64,
`contributorDependencyBenchmark` UInt64,
`organizationDependencyCount` UInt64,
`organizationDependencyPercentage` Float64,
`organizationDependencyBenchmark` UInt64,
`retentionRate` Float64,
`retentionBenchmark` UInt64,
`stars` UInt64,
`starsBenchmark` UInt64,
`forks` UInt64,
`forksBenchmark` UInt64,
`issueResolution` Nullable(Float64),
`issueResolutionBenchmark` UInt64,
`pullRequests` UInt64,
`pullRequestsBenchmark` UInt64,
`mergeLeadTime` Nullable(Float64),
`mergeLeadTimeBenchmark` UInt64,
`activeDaysCount` UInt64,
`activeDaysBenchmark` UInt64,
`contributionsOutsideWorkHours` Float64,
`contributionsOutsideWorkHoursBenchmark` UInt64,
`securityPercentage` Float64,
`contributorPercentage` Float64,
`popularityPercentage` Float64,
`developmentPercentage` Float64,
`overallScore` Float64

ENGINE MergeTree
ENGINE_SORTING_KEY channel
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
DESCRIPTION >
- `repositories_populated_ds` contains enriched repository data with computed metrics.
- Populated by `repositories_populated_copy.pipe` copy pipe.
- Extends base repository data with contributor counts, software valuation, and first commit timestamp.
- `id` is the primary key identifier for the repository record.
- `url` is the full repository URL.
- `segmentId` links to the segment this repository belongs to.
- `insightsProjectId` links to the insights project this repository is associated with.
- `contributorCount` is the total number of unique contributors for the repository.
- `organizationCount` is the total number of unique organizations for the repository.
- `softwareValue` is the estimated economic value of the repository software.
- `firstCommit` is the timestamp of the first commit in the repository (nullable).

TAGS "Repository metadata", "Analytics enrichment"

SCHEMA >
`id` String,
`url` String,
`segmentId` String,
`insightsProjectId` String,
`contributorCount` UInt64,
`organizationCount` UInt64,
`softwareValue` UInt64,
`firstCommit` Nullable(DateTime64(3))

ENGINE MergeTree
ENGINE_SORTING_KEY id, url
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
NODE health_score_active_contributors_benchmark
SQL >
%
SELECT
$GROUP_COL,
activeContributors,
CASE
WHEN activeContributors BETWEEN 0 AND 1 THEN 0
WHEN activeContributors BETWEEN 2 AND 3 THEN 1
WHEN activeContributors BETWEEN 4 AND 6 THEN 2
WHEN activeContributors BETWEEN 7 AND 10 THEN 3
WHEN activeContributors BETWEEN 11 AND 20 THEN 4
WHEN activeContributors > 20 THEN 5
ELSE 0
END AS activeContributorsBenchmark
FROM $SOURCE_NODE
16 changes: 16 additions & 0 deletions services/libs/tinybird/includes/health_score_active_days.incl
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
NODE health_score_active_days_benchmark
SQL >
%
SELECT
$GROUP_COL,
activeDaysCount,
CASE
WHEN activeDaysCount BETWEEN 0 AND 5 THEN 0
WHEN activeDaysCount BETWEEN 6 AND 10 THEN 1
WHEN activeDaysCount BETWEEN 11 AND 15 THEN 2
WHEN activeDaysCount BETWEEN 16 AND 20 THEN 3
WHEN activeDaysCount BETWEEN 21 AND 26 THEN 4
WHEN activeDaysCount > 26 THEN 5
ELSE 0
END AS activeDaysBenchmark
FROM $SOURCE_NODE
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
NODE health_score_contributions_outside_work_hours_benchmark
SQL >
%
SELECT
$GROUP_COL,
contributionsOutsideWorkHours,
CASE
WHEN contributionsOutsideWorkHours >= 75 THEN 0
WHEN contributionsOutsideWorkHours BETWEEN 50 AND 74 THEN 1
WHEN contributionsOutsideWorkHours BETWEEN 40 AND 49 THEN 2
WHEN contributionsOutsideWorkHours BETWEEN 30 AND 39 THEN 3
WHEN contributionsOutsideWorkHours BETWEEN 20 AND 29 THEN 4
WHEN contributionsOutsideWorkHours BETWEEN 0 AND 19 THEN 5
ELSE 0
END AS contributionsOutsideWorkHoursBenchmark
FROM $SOURCE_NODE
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
NODE health_score_contributor_dependency_pct
SQL >
%
SELECT
$GROUP_COL,
memberId,
contributionCount,
ROUND(contributionCount * 100.0 / SUM(contributionCount) OVER (PARTITION BY $GROUP_COL), 2) AS contributionPercentage
FROM $SOURCE_NODE
ORDER BY contributionPercentage DESC

NODE health_score_contributor_dependency_running
SQL >
%
SELECT
$GROUP_COL,
memberId,
contributionPercentage,
SUM(contributionPercentage) OVER (
PARTITION BY $GROUP_COL ORDER BY contributionPercentage DESC, memberId
) AS contributionPercentageRunningTotal
FROM health_score_contributor_dependency_pct

NODE health_score_contributor_dependency_score
SQL >
%
SELECT
$GROUP_COL,
count() AS contributorDependencyCount,
round(sum(contributionPercentage)) AS contributorDependencyPercentage
FROM health_score_contributor_dependency_running
WHERE
contributionPercentageRunningTotal < 51
OR (contributionPercentageRunningTotal - contributionPercentage < 51)
GROUP BY $GROUP_COL

NODE health_score_contributor_dependency_benchmark
SQL >
%
SELECT
$GROUP_COL,
contributorDependencyCount,
contributorDependencyPercentage,
CASE
WHEN contributorDependencyCount BETWEEN 0 AND 1 THEN 0
WHEN contributorDependencyCount = 2 THEN 1
WHEN contributorDependencyCount BETWEEN 3 AND 4 THEN 2
WHEN contributorDependencyCount BETWEEN 5 AND 6 THEN 3
WHEN contributorDependencyCount BETWEEN 7 AND 9 THEN 4
WHEN contributorDependencyCount > 9 THEN 5
ELSE 0
END AS contributorDependencyBenchmark
FROM health_score_contributor_dependency_score
16 changes: 16 additions & 0 deletions services/libs/tinybird/includes/health_score_forks.incl
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
NODE health_score_forks_benchmark
SQL >
%
SELECT
$GROUP_COL,
forks,
CASE
WHEN forks BETWEEN 0 AND 4 THEN 0
WHEN forks BETWEEN 5 AND 9 THEN 1
WHEN forks BETWEEN 10 AND 19 THEN 2
WHEN forks BETWEEN 20 AND 39 THEN 3
WHEN forks BETWEEN 40 AND 79 THEN 4
WHEN forks >= 80 THEN 5
ELSE 0
END AS forksBenchmark
FROM $SOURCE_NODE
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
NODE health_score_issues_resolution_benchmark
SQL >
%
SELECT
$GROUP_COL,
issueResolution,
CASE
WHEN issueResolution >= 61 THEN 0
WHEN issueResolution BETWEEN 51 AND 60 THEN 1
WHEN issueResolution BETWEEN 36 AND 50 THEN 2
WHEN issueResolution BETWEEN 22 AND 35 THEN 3
WHEN issueResolution BETWEEN 8 AND 21 THEN 4
WHEN issueResolution BETWEEN 0 AND 7 THEN 5
ELSE 0
END AS issueResolutionBenchmark
FROM $SOURCE_NODE
16 changes: 16 additions & 0 deletions services/libs/tinybird/includes/health_score_merge_lead_time.incl
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
NODE health_score_merge_lead_time_benchmark
SQL >
%
SELECT
$GROUP_COL,
mergeLeadTime,
CASE
WHEN mergeLeadTime >= 30 THEN 0
WHEN mergeLeadTime BETWEEN 21 AND 30 THEN 1
WHEN mergeLeadTime BETWEEN 15 AND 20 THEN 2
WHEN mergeLeadTime BETWEEN 7 AND 14 THEN 3
WHEN mergeLeadTime BETWEEN 3 AND 6 THEN 4
WHEN mergeLeadTime BETWEEN 0 AND 2 THEN 5
ELSE 0
END AS mergeLeadTimeBenchmark
FROM $SOURCE_NODE
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
NODE health_score_organization_dependency_pct
SQL >
%
SELECT
$GROUP_COL,
organizationId,
contributionCount,
(contributionCount * 100.0 / SUM(contributionCount) OVER (PARTITION BY $GROUP_COL)) AS contributionPercentage
FROM $SOURCE_NODE
ORDER BY contributionPercentage DESC

NODE health_score_organization_dependency_running
SQL >
%
SELECT
$GROUP_COL,
organizationId,
contributionPercentage,
SUM(contributionPercentage) OVER (
PARTITION BY $GROUP_COL ORDER BY contributionPercentage DESC, organizationId
) AS contributionPercentageRunningTotal
FROM health_score_organization_dependency_pct

NODE health_score_organization_dependency_score
SQL >
%
SELECT
$GROUP_COL,
count() AS organizationDependencyCount,
round(sum(contributionPercentage)) AS organizationDependencyPercentage
FROM health_score_organization_dependency_running
WHERE
contributionPercentageRunningTotal < 51
OR (contributionPercentageRunningTotal - contributionPercentage < 51)
GROUP BY $GROUP_COL

NODE health_score_organization_dependency_benchmark
SQL >
%
SELECT
$GROUP_COL,
organizationDependencyCount,
organizationDependencyPercentage,
CASE
WHEN organizationDependencyCount BETWEEN 0 AND 1 THEN 0
WHEN organizationDependencyCount = 2 THEN 1
WHEN organizationDependencyCount = 3 THEN 2
WHEN organizationDependencyCount BETWEEN 4 AND 5 THEN 3
WHEN organizationDependencyCount BETWEEN 6 AND 7 THEN 4
WHEN organizationDependencyCount >= 8 THEN 5
ELSE 0
END AS organizationDependencyBenchmark
FROM health_score_organization_dependency_score
16 changes: 16 additions & 0 deletions services/libs/tinybird/includes/health_score_pull_requests.incl
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
NODE health_score_pull_requests_benchmark
SQL >
%
SELECT
$GROUP_COL,
pullRequests,
CASE
WHEN pullRequests BETWEEN 0 AND 1 THEN 0
WHEN pullRequests BETWEEN 2 AND 3 THEN 1
WHEN pullRequests BETWEEN 4 AND 7 THEN 2
WHEN pullRequests BETWEEN 8 AND 15 THEN 3
WHEN pullRequests BETWEEN 16 AND 30 THEN 4
WHEN pullRequests >= 31 THEN 5
ELSE 0
END AS pullRequestsBenchmark
FROM $SOURCE_NODE
34 changes: 34 additions & 0 deletions services/libs/tinybird/includes/health_score_retention.incl
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
NODE health_score_retention_counts
SQL >
%
SELECT
cur.$GROUP_COL AS $GROUP_COL,
if(
length(coalesce(prev.previousQuarterMembers, [])) > 0,
round(
100 * length(arrayIntersect(
coalesce(cur.currentQuarterMembers, []),
coalesce(prev.previousQuarterMembers, [])
)) / length(coalesce(prev.previousQuarterMembers, []))
),
0
) AS retentionRate
FROM $SOURCE_CURRENT AS cur
LEFT JOIN $SOURCE_PREVIOUS AS prev USING ($GROUP_COL)

NODE health_score_retention_benchmark
SQL >
%
SELECT
$GROUP_COL,
retentionRate,
CASE
WHEN retentionRate BETWEEN 0 AND 2 THEN 0
WHEN retentionRate BETWEEN 3 AND 5 THEN 1
WHEN retentionRate BETWEEN 6 AND 9 THEN 2
WHEN retentionRate BETWEEN 10 AND 14 THEN 3
WHEN retentionRate BETWEEN 15 AND 19 THEN 4
WHEN retentionRate >= 20 THEN 5
ELSE 0
END AS retentionBenchmark
FROM health_score_retention_counts
Loading
Loading