You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This will download the dataseets to your device and cache them. You may specify the download location by setting the environment variable `POLYGRAPH_CACHE_DIR`.
20
+
This will download the datasets to your device and cache them. You may specify the download location by setting the environment variable `POLYGRAPH_CACHE_DIR`.
21
21
All datasets in `polygraph` contain PyTorch-geometric objects.
22
22
However, we may also access the graphs as NetworkX objects as follows:
23
23
@@ -28,7 +28,7 @@ print(planar_nx[0]) # (Networkx) Graph with 64 nodes and 177 edges
28
28
29
29
## Comparing Distributions of Graphs
30
30
When evaluating graph generative models, we want to compare a set of *generated* graphs to a set of *reference* graphs (typically the test set).
31
-
In `polygraph`, we provide various different metrics to quantify how close these two sets of graphs are.
31
+
In `polygraph`, we provide various different metrics to quantify how similar these two sets of graphs are.
32
32
We usually pass collections of NetworkX graphs to metrics.
33
33
Below, we demonstrate how a set of these metrics, combined in the [`MMD2CollectionGaussianTV`][polygraph.metrics.MMD2CollectionGaussianTV] benchmark may be computed:
34
34
@@ -44,7 +44,7 @@ print(benchmark.compute(generated)) # Dictionary of different metrics
44
44
45
45
We discuss available metrics [in the next tutorial](metrics_overview.md).
46
46
47
-
All metrics are evaluated in a similar fashion:
47
+
All metrics are evaluated in a similar fashion, as defined by the common [interface](../api_reference/metrics/interface.md):
48
48
49
49
- We first initialize a metric object via `benchmark = MMD2CollectionGaussianTV(reference)`. This fits the metric to the `reference` set, caching data that is required in later computations
50
50
- We then compute the metric against the generated set via `benchmark.compute(generated)`
print(metrics.compute(generated_graphs)) # Dictionary of metrics
@@ -28,19 +28,31 @@ We now proceed to give a high-level overview over the different types of metrics
28
28
29
29
## Maximum Mean Discrepancy
30
30
31
-
[Maximum Mean Discrepancy (MMD)](../api_reference/metrics/mmd.md) is the most commonly used approach for comparing graph distributions.
31
+
[Maximum Mean Discrepancy (MMD)](../api_reference/metrics/mmd.md) is the predominant method for comparing graph distributions.
32
32
The two distributions are embedded in a reproducing kernel Hilbert space (RKHS) and their distance is then computed in this space.
33
33
34
-
To construct an MMD metric, one must choose two components:
34
+
In `polygraph`, we bundle the most commonly used MMD metrics in two benchmark classes: [`MMD2CollectionGaussianTV`][polygraph.metrics.MMD2CollectionGaussianTV] and [`MMD2CollectionRBF`][polygraph.metrics.MMD2CollectionRBF]. These benchmarks may be evaluated in the following fashion:
35
+
36
+
```python
37
+
from polygraph.datasets import PlanarGraphDataset, SBMGraphDataset
38
+
from polygraph.metrics import MMD2CollectionGaussianTV, MMD2IntervalCollectionGaussianTV
For more details on these collections we refer to the documentation on the [Gaussian TV metrics](../metrics/gaussian_tv_mmd.md) and [RBF metrics](../metrics/rbf_mmd.md).
49
+
50
+
Polygraph also allows you to construct custom MMD metrics. To construct an MMD metric, one must choose two components:
35
51
36
52
-[Descriptor](../api_reference/utils/graph_descriptors.md) - A function that transforms graphs into vectorial descriptions
37
-
-[Kernel](../api_reference/utils/graph_kernels.md) - A kernel function operating on these vectors produced by the descriptor
53
+
-[Kernel](../api_reference/utils/graph_kernels.md) - A kernel function operating on the vectors produced by the descriptor
38
54
39
55
We implement a large number of different descriptors and kernels in `polygraph`.
40
-
For convenience, we provide commonly used combinations of kernels, descriptors, and estimators, based on [classical descriptors](../metrics/gran.md) or [gnn features](../metrics/gin.md).
41
-
We recommend using these standardized implementations to ensure fair and comparable evaluations.
42
-
43
-
However, you may also construct custom MMD metrics, combining kernels and descriptors as you like.
44
56
An MMD metric operating on orbit counts with a linear kernel may thus be constructed in the following fashion:
45
57
46
58
```python
@@ -58,20 +70,33 @@ metric = DescriptorMMD2(
58
70
59
71
The MMD may be computed via a biased estimator (`"biased"`) or via an unbiased one (`"umve"`).
60
72
In the large sample size limit, the two should converge to the same value. However, at low sample sizes the differences may be substantial.
61
-
In practice the biased estimator is oftentimes used.
73
+
In practice the biased estimator is oftentimes used. We refer to the documentation of the [base MMD classes](../api_reference/metrics/mmd.md).
62
74
63
75
64
76
!!! warning
65
77
MMD metrics that are computed with different estimators, metrics, or kernels lie on different scales and are not comparable to each other.
66
78
67
79
## PolyGraphScore
68
80
69
-
The [PolyGraphScore metric](../api_reference/metrics/polygraphscore.md) operates in a similar fashion as MMD metrics. However, it aims to make metrics comparable across graph descriptors and produces interpretable values between 0 and 1.
70
-
The PolyGraphScore is typically computed for several graph descriptors and produces a summary metric for these descriptors.
81
+
The [PolyGraphScore metric](../api_reference/metrics/polygraphscore.md) compares two graph distributions by determining how well they can be distinguished by a binary classifier.
82
+
It aims to make metrics comparable across graph descriptors and produces interpretable values between 0 and 1.
83
+
The PolyGraphScore is computed for several graph descriptors and produces a summary metric for these descriptors.
71
84
This summary metric is an estimated lower bound on a probability metric that is intrinsic to the graph distributions and independent of the descriptors
72
85
86
+
We provide [`PGS5`][polygraph.metrics.PGS5], a standardized version of the PolyGraphScore that combines 5 different graph descriptors:
87
+
73
88
```python
74
89
from polygraph.datasets import PlanarGraphDataset, SBMGraphDataset
As with MMD metrics, you may also construct custom PolyGraphScore variants using other graph descriptors, evaluation metrics, or binary classification approaches.
97
+
E.g., you may construct the following metric
98
+
99
+
```python
75
100
from polygraph.utils.graph_descriptors import OrbitCounts, SparseDegreeHistogram
All synthetic datasets in the `polygraph` package provide a static `is_valid` function.
116
141
If no validity function is available for your dataset, `validity_fn` may be set to `None`. In this case, only the fraction of unique and novel graphs is computed.
0 commit comments