This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
- CLAUDE.md is the canonical source of truth for project context, architecture intent, and workflow conventions.
- AGENTS.md should direct agents to read CLAUDE.md and discover/use skills under
.claude/skills.
GiGL (GIgantic Graph Learning) is an open-source library for training and inference of Graph Neural Networks at
billion-scale. It supports node classification, link prediction, and both supervised and unsupervised learning. Python
3.11, uv for package management.
# Setup
make install_dev_deps # Full dev setup (gcloud auth, uv, pre-commit)
# Testing
make unit_test_py # All Python unit tests (includes type_check)
# NOTE: PY_TEST_FILES should *only* be the filename, *not* the full path.
# e.g. if you want to test `tests/unit/common/foo_test.py` then you should run `make unit_test_py PY_TEST_FILES="foo_test.py"
make unit_test_py PY_TEST_FILES="specific_test.py" # Single test file
make integration_test PY_TEST_FILES="specific_test.py" # Integration (run one at a time, slow)
# Formatting & Linting
make format # Auto-fix Python, Scala, Markdown
make format_py # Auto-fix Python only
make format_scala # Auto-fix Scala only
make format_md # Auto-fix Markdown only
make check_format # Check without fixing
make type_check # mypy static type checking
# Build
make compile_protos # Regenerate protobuf code after .proto changes
make build_docs # Sphinx documentationGiGL runs as a multi-stage pipeline. Each stage is a standalone runnable module:
- ConfigPopulator (
config_populator/) - Deserializes YAML task configs into protobuf - DataPreprocessor (
data_preprocessor/) - Preprocesses raw graph data - SplitGenerator (
split_generator/) - Creates train/val/test splits - Deprecated, do not consider for planning unless explicitly asked. - SubgraphSampler (
subgraph_sampler/) - Samples subgraphs for training - Deprecated, do not consider for planning unless explicitly asked. - Trainer (
training/) - V1 trainer and V2 GLT trainer - Inferencer (
inference/) - Model inference - PostProcessor (
post_process/) - Post-processing results
GiGL extends GraphLearn-for-PyTorch (GLT) for distributed GNN training. Key class hierarchy:
DistDataset(extendsgraphlearn_torch.distributed.DistDataset) - Core data container adding link prediction labels, split metadata, and feature infoDistNeighborLoader(extends GLTDistLoader) - Standard node-based sampling loaderDistABLPLoader(extends GLTDistLoader) - Anchor-Based Link Prediction sampling loaderBaseGiGLSampler(extends GLTDistNeighborSampler) - Base class with shared input preparation (ABLP support)DistNeighborSampler(extendsBaseGiGLSampler) - K-hop neighbor sampling with ABLP supportDistPPRNeighborSampler(extendsBaseGiGLSampler) - PPR-based neighbor sampling with ABLP support
Two deployment modes:
- Colocated: Data and compute on same nodes. Each rank has a local partition of the graph/features.
- Graph Store: Separate storage and compute clusters. Storage nodes run
DistServer, compute nodes useRemoteDistDatasetvia RPC. Scales to 100+ nodes using sequential per-node initialization to avoid GLT's ThreadPoolExecutor bottleneck.
Data flow:
dataset_factory.build_dataset() → DistDataset (partitioned via DistPartitioner)
→ DistNeighborLoader / DistABLPLoader → sampled subgraph batches (Data/HeteroData)
→ Model training loop
Key files:
dist_dataset.py- Core dataset structure with IPC serializationdistributed_neighborloader.py- DistNeighborLoader (both modes)dist_ablp_neighborloader.py- DistABLPLoader (both modes)base_sampler.py- BaseGiGLSampler with shared ABLP input preparationdist_neighbor_sampler.py- K-hop neighbor sampling (DistNeighborSampler)dist_ppr_sampler.py- PPR-based neighbor sampling (DistPPRNeighborSampler)dataset_factory.py- Dataset building and partitioning orchestrationgraph_store/dist_server.py- Storage server for Graph Store modegraph_store/remote_dist_dataset.py- Client-side dataset proxygraph_store/compute.py- RPC utilities (request_server,async_request_server)utils/neighborloader.py-SamplingClusterSetupenum,DatasetSchema
Graph types: Supports homogeneous, heterogeneous, and "labeled homogeneous" (heterogeneous with one default node type + label edge types, treated as homogeneous for sampling).
gigl/common/- Shared utilities:Uritypes (GcsUri, HttpUri, LocalUri),Logger, services, metricsgigl/orchestration/- Kubeflow pipeline compilation and local orchestrationgigl/nn/- Neural network modulesgigl/src/common/types/pb_wrappers/- Protobuf wrapper classesgigl/src/mocking/- Dataset asset mocking for tests
Two Scala projects under scala/ and scala_spark35/, built with SBT. These are legacy and not the focus of active
development.
- Task configs: YAML files defining the GNN pipeline (examples in
examples/) - Resource configs: YAML files for GCP resources (e.g.,
deployment/configs/) - Test resource config:
deployment/configs/unittest_resource_config.yaml
- Use explicit, unabbreviated variable names. When in doubt, spell it out. Shortened names are OK only for universally
understood abbreviations (
i,e,url,id,config) or to avoid shadowing. - Use OOP for model architectures, functional style for data transforms/pipelines.
- Re-use and refactor existing code as a priority instead of implementing new code.
- Use
dict[key](bracket access) when the key must exist. Only use.get(key, default)when absence is a valid, expected case with a meaningful default. - Validate preconditions at function entry. Raise explicit exceptions rather than silently continuing with bad data.
- Always use type annotations for function parameters and return values.
- Prefer native types (
dict[str, str],list[int]) overtyping.Dict,typing.List. - Use
Finalfor constants. Use@dataclass(frozen=True)for immutable data containers when named fields and a stable shape add real clarity; do not introduce a dataclass for tiny internal-only plumbing. - Always annotate empty containers:
names: list[str] = []notnames = [].
Add Google-style docstrings for all public functions and methods. Include: one-line summary, optional details, Example
with >>> for doctests, Args, Returns, and Raises. Docstrings should be Sphinx-compatible.
Separate independent statements in docstrings with blank lines. Each distinct idea (purpose, preconditions, side effects, caveats) should be its own line. For example:
# Bad
"""Computes the foo property of the baz object. Requires baz to be fooable."""
# Good
"""Computes the foo property of the baz object.
Requires baz to be fooable.
"""from gigl.common.logger import Logger
logger = Logger()- Proto definitions:
proto/snapchat/research/gbml/. Import types fromsnapchat.research.gbml. - Use wrapper classes for protobuf operations:
GbmlConfigPbWrapperforgbml_config_pb2.GbmlConfig(task config / template task config)GiglResourceConfigWrapperforgigl_resource_config_pb2.GiglResourceConfig
- Deserialize protos into wrapper objects or explicit data classes as early as possible in entry-point files
(ConfigPopulator, DataPreprocessor, SubgraphSampler, SplitGenerator, Trainer, Inferencer). Downstream code called by
these entry points should NOT receive
GbmlConfigPbWrapperorGiglResourceConfigWrapperdirectly.
- Use generators for large data processing.
- Use
concurrent.futures.ProcessPoolExecutorfor CPU-bound parallel tasks. - Use GiGL's timeout utilities:
from gigl.src.common.utils.timeout import timeout - Be mindful of memory in distributed settings. Delete intermediate tensors and call
gc.collect()to prevent OOM.
Define a minimal, consistent public API. Only expose stable, user-facing classes/functions through __all__. Keep
helpers/internal logic in private modules.
Use tests.test_assets.test_case.TestCase as the base class, NOT unittest.TestCase.
- Unit tests:
tests/unit/- Fast, isolated tests - Integration tests:
tests/integration/- Component interaction tests, require cloud resources - E2E tests: Defined in
tests/e2e_tests/e2e_tests.yaml - Test assets:
tests/test_assets/(configs inconfigs/, test graphs insmall_graph/)
from tests.test_assets.test_case import TestCase
class TestMyComponent(TestCase):
@classmethod
def setUpClass(cls) -> None:
# Shared resources across tests
...
def setUp(self) -> None:
# Per-test setup
...
def tearDown(self) -> None:
# Per-test cleanup
...- Test error cases with
self.assertRaises. - Avoid asserting on exact error message strings unless the message is load-bearing: disambiguating multiple error paths in the same function, or structured error reporting used downstream.
Mock external services using unittest.mock (Mock, patch, MagicMock). Create minimal test configs in
tests/test_assets/configs/.
Before changing a broken test, understand both the intent of the test and what caused the breakage. The correct fix depends on context:
- Update expected values when the production code changed intentionally and the test's old expectations are now stale (e.g., a function's output format changed by design).
- Update the testing strategy when the test was testing the wrong thing, or the code change reveals that the test's approach no longer makes sense (e.g., mocking internals that were refactored away).
- Fix the production code when the test is correct and the breakage reveals an actual bug.
- Fix test infrastructure when the failure is in setup, fixtures, mocks, or environment configuration rather than in the code under test (e.g., a mock that no longer matches the interface it doubles, or a missing cloud resource).
Do not blindly make tests pass. Read the test, read the diff that broke it, and choose the appropriate fix.
Plan documents live in docs/plans/ and must be date-prefixed using the format YYYYMMDD-<short-description>.md (e.g.,
20260324-add-foo-factory.md). Use today's date at the time of creation.
Ground every claim in a plan with a concrete reference: file paths with line numbers (e.g., foo.py:42), API
signatures, log output, or test results. Do not assert that code behaves a certain way without pointing to the evidence.
If a plan references a function, class, or config value, include where it is defined so readers can verify.
-
For a pre-submit checklist and formatting see .claude/formatting.md
-
For general development and branch naming conventions see .claude/development.md
-
When migrating code, make sure to migrate any doc comments or diagrams over to the new code.