Skip to content

Commit cc7266a

Browse files
authored
feat: repository analytics & repo populated & repo health score & health score refactor (IN-1054) (#3987)
Signed-off-by: Gašper Grom <gasper.grom@gmail.com>
1 parent 76d870a commit cc7266a

32 files changed

Lines changed: 1495 additions & 84 deletions

services/libs/tinybird/datasources/project_insights_copy_ds.datasource

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,9 @@ DESCRIPTION >
22
- `project_insights_copy_ds` contains materialized project insights data.
33
- Populated by `project_insights_copy.pipe` copy pipe.
44
- Includes project metadata, health score, first commit, and activity metrics for last 365 days and previous 365 days.
5-
- `id` column is the primary key identifier for the project.
5+
- `id` column is the primary key identifier for the project or repository.
6+
- `type` column indicates the record type: 'project' for project insights or 'repo' for repository insights.
7+
- `repoUrl` column is the full repository URL for repo type records (empty string for project type).
68
- `name` column is the human-readable project name.
79
- `slug` column is the URL-friendly identifier used in routing and filtering.
810
- `logoUrl` column is the URL to the project's logo image.
@@ -35,6 +37,8 @@ TAGS "Project insights", "Metrics"
3537

3638
SCHEMA >
3739
`id` String,
40+
`type` String,
41+
`repoUrl` String,
3842
`name` String,
3943
`slug` String,
4044
`logoUrl` String,
@@ -64,4 +68,4 @@ SCHEMA >
6468
`activeOrganizationsPrevious365Days` UInt64
6569

6670
ENGINE MergeTree
67-
ENGINE_SORTING_KEY id
71+
ENGINE_SORTING_KEY type, id
Lines changed: 71 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,71 @@
1+
DESCRIPTION >
2+
- `repo_health_score_copy_ds` contains comprehensive health score metrics and benchmarks per repository.
3+
- Created via copy pipe with computed health metrics for repository-level analytics.
4+
- Aggregates multiple health dimensions including contributors, popularity, development activity, and security.
5+
- `channel` is the repository URL used as the primary key.
6+
- `activeContributors` is the unique contributor count for the previous quarter.
7+
- `activeContributorsBenchmark` is the benchmark score (0-5) for active contributors.
8+
- `contributorDependencyCount` measures contributor concentration risk (bus factor).
9+
- `contributorDependencyPercentage` is the combined contribution percentage of dependent contributors.
10+
- `contributorDependencyBenchmark` is the benchmark score (0-5) for contributor dependency.
11+
- `organizationDependencyCount` measures organizational concentration risk.
12+
- `organizationDependencyPercentage` is the combined contribution percentage of dependent organizations.
13+
- `organizationDependencyBenchmark` is the benchmark score (0-5) for organization dependency.
14+
- `retentionRate` is the quarter-over-quarter contributor retention percentage.
15+
- `retentionBenchmark` is the benchmark score (0-5) for retention.
16+
- `stars` is the total star count for the repository.
17+
- `starsBenchmark` is the benchmark score (0-5) for stars.
18+
- `forks` is the total fork count for the repository.
19+
- `forksBenchmark` is the benchmark score (0-5) for forks.
20+
- `issueResolution` is the average days to close issues (nullable for repos without issues).
21+
- `issueResolutionBenchmark` is the benchmark score (0-5) for issue resolution.
22+
- `pullRequests` is the PR count in the last 365 days.
23+
- `pullRequestsBenchmark` is the benchmark score (0-5) for pull requests.
24+
- `mergeLeadTime` is the average days to merge PRs (nullable for repos without PRs).
25+
- `mergeLeadTimeBenchmark` is the benchmark score (0-5) for merge lead time.
26+
- `activeDaysCount` is the count of distinct active days in the last 365 days.
27+
- `activeDaysBenchmark` is the benchmark score (0-5) for active days.
28+
- `contributionsOutsideWorkHours` is the percentage of contributions outside work hours.
29+
- `contributionsOutsideWorkHoursBenchmark` is the benchmark score (0-5) for outside work hours.
30+
- `securityPercentage` is the health score percentage for the security category (0-100).
31+
- `contributorPercentage` is the health score percentage for the contributors category (0-100).
32+
- `popularityPercentage` is the health score percentage for the popularity category (0-100).
33+
- `developmentPercentage` is the health score percentage for the development category (0-100).
34+
- `overallScore` is the computed overall health score combining all dimensions.
35+
36+
TAGS "Repository health", "Metrics"
37+
38+
SCHEMA >
39+
`channel` String,
40+
`activeContributors` UInt64,
41+
`activeContributorsBenchmark` UInt64,
42+
`contributorDependencyCount` UInt64,
43+
`contributorDependencyPercentage` Float64,
44+
`contributorDependencyBenchmark` UInt64,
45+
`organizationDependencyCount` UInt64,
46+
`organizationDependencyPercentage` Float64,
47+
`organizationDependencyBenchmark` UInt64,
48+
`retentionRate` Float64,
49+
`retentionBenchmark` UInt64,
50+
`stars` UInt64,
51+
`starsBenchmark` UInt64,
52+
`forks` UInt64,
53+
`forksBenchmark` UInt64,
54+
`issueResolution` Nullable(Float64),
55+
`issueResolutionBenchmark` UInt64,
56+
`pullRequests` UInt64,
57+
`pullRequestsBenchmark` UInt64,
58+
`mergeLeadTime` Nullable(Float64),
59+
`mergeLeadTimeBenchmark` UInt64,
60+
`activeDaysCount` UInt64,
61+
`activeDaysBenchmark` UInt64,
62+
`contributionsOutsideWorkHours` Float64,
63+
`contributionsOutsideWorkHoursBenchmark` UInt64,
64+
`securityPercentage` Float64,
65+
`contributorPercentage` Float64,
66+
`popularityPercentage` Float64,
67+
`developmentPercentage` Float64,
68+
`overallScore` Float64
69+
70+
ENGINE MergeTree
71+
ENGINE_SORTING_KEY channel
Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
DESCRIPTION >
2+
- `repositories_populated_ds` contains enriched repository data with computed metrics.
3+
- Populated by `repositories_populated_copy.pipe` copy pipe.
4+
- Extends base repository data with contributor counts, software valuation, and first commit timestamp.
5+
- `id` is the primary key identifier for the repository record.
6+
- `url` is the full repository URL.
7+
- `segmentId` links to the segment this repository belongs to.
8+
- `insightsProjectId` links to the insights project this repository is associated with.
9+
- `contributorCount` is the total number of unique contributors for the repository.
10+
- `organizationCount` is the total number of unique organizations for the repository.
11+
- `softwareValue` is the estimated economic value of the repository software.
12+
- `firstCommit` is the timestamp of the first commit in the repository (nullable).
13+
14+
TAGS "Repository metadata", "Analytics enrichment"
15+
16+
SCHEMA >
17+
`id` String,
18+
`url` String,
19+
`segmentId` String,
20+
`insightsProjectId` String,
21+
`contributorCount` UInt64,
22+
`organizationCount` UInt64,
23+
`softwareValue` UInt64,
24+
`firstCommit` Nullable(DateTime64(3))
25+
26+
ENGINE MergeTree
27+
ENGINE_SORTING_KEY id, url

services/libs/tinybird/pipes/health_score_active_contributors.pipe

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
NODE health_score_active_contributors_score
1+
NODE health_score_active_contributors_score_something_else
22
DESCRIPTION >
33
Returns activeContributors for previous quarter per project
44

@@ -34,8 +34,9 @@ SQL >
3434
GROUP BY segmentId
3535
{% end %}
3636

37-
NODE health_score_active_contributors_with_benchmark
37+
NODE health_score_active_contributors_benchmark
3838
SQL >
39+
%
3940
SELECT
4041
segmentId,
4142
activeContributors,
@@ -54,4 +55,4 @@ SQL >
5455
THEN 5
5556
ELSE 0
5657
END AS activeContributorsBenchmark
57-
FROM health_score_active_contributors_score
58+
FROM health_score_active_contributors_score_something_else

services/libs/tinybird/pipes/health_score_active_days.pipe

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -27,8 +27,9 @@ SQL >
2727
GROUP BY segmentId
2828
{% end %}
2929

30-
NODE health_score_active_days_with_benchmark
30+
NODE health_score_active_days_benchmark
3131
SQL >
32+
%
3233
SELECT
3334
segmentId,
3435
activeDaysCount,

services/libs/tinybird/pipes/health_score_contributions_outside_work_hours.pipe

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -29,8 +29,9 @@ SQL >
2929
{% end %}
3030
GROUP BY segmentId
3131

32-
NODE health_score_contributions_outside_work_hours_with_benchmark
32+
NODE health_score_contributions_outside_work_hours_benchmark
3333
SQL >
34+
%
3435
SELECT
3536
segmentId,
3637
contributionsOutsideWorkHours,

services/libs/tinybird/pipes/health_score_contributor_dependency.pipe

Lines changed: 9 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -33,8 +33,9 @@ SQL >
3333
ORDER by contributionCount DESC
3434
{% end %}
3535

36-
NODE health_score_contributor_dependency_contribution_percentage
36+
NODE health_score_contributor_dependency_pct
3737
SQL >
38+
%
3839
SELECT
3940
segmentId,
4041
memberId,
@@ -45,32 +46,34 @@ SQL >
4546
FROM health_score_contributor_dependency_contribution_count
4647
ORDER BY contributionPercentage DESC
4748

48-
NODE health_score_contributor_dependency_contribution_runnning_total
49+
NODE health_score_contributor_dependency_running
4950
SQL >
51+
%
5052
SELECT
5153
segmentId,
5254
memberId,
53-
contributionCount,
5455
contributionPercentage,
5556
SUM(contributionPercentage) OVER (
5657
PARTITION BY segmentId ORDER BY contributionPercentage DESC, memberId
5758
) AS contributionPercentageRunningTotal
58-
FROM health_score_contributor_dependency_contribution_percentage
59+
FROM health_score_contributor_dependency_pct
5960

6061
NODE health_score_contributor_dependency_score
6162
SQL >
63+
%
6264
SELECT
6365
segmentId,
6466
count() AS contributorDependencyCount,
6567
round(sum(contributionPercentage)) AS contributorDependencyPercentage
66-
FROM health_score_contributor_dependency_contribution_runnning_total
68+
FROM health_score_contributor_dependency_running
6769
WHERE
6870
contributionPercentageRunningTotal < 51
6971
OR (contributionPercentageRunningTotal - contributionPercentage < 51)
7072
GROUP BY segmentId
7173

72-
NODE health_score_contributor_dependency_with_benchmark
74+
NODE health_score_contributor_dependency_benchmark
7375
SQL >
76+
%
7477
SELECT
7578
segmentId,
7679
contributorDependencyCount,

services/libs/tinybird/pipes/health_score_forks.pipe

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -28,8 +28,9 @@ SQL >
2828
GROUP BY segmentId
2929
{% end %}
3030

31-
NODE health_score_forks_with_benchmark
31+
NODE health_score_forks_benchmark
3232
SQL >
33+
%
3334
SELECT
3435
segmentId,
3536
forks,

services/libs/tinybird/pipes/health_score_issues_resolution.pipe

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -38,8 +38,9 @@ SQL >
3838
GROUP BY segmentId
3939
{% end %}
4040

41-
NODE health_score_issues_resolution_with_benchmark
41+
NODE health_score_issues_resolution_benchmark
4242
SQL >
43+
%
4344
SELECT
4445
segmentId,
4546
issueResolution,

services/libs/tinybird/pipes/health_score_merge_lead_time.pipe

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -26,8 +26,9 @@ SQL >
2626
{% end %}
2727
GROUP BY segmentId
2828

29-
NODE health_score_merge_lead_time_with_benchmark
29+
NODE health_score_merge_lead_time_benchmark
3030
SQL >
31+
%
3132
SELECT
3233
segmentId,
3334
mergeLeadTime,

0 commit comments

Comments
 (0)