This repository contains three complementary approaches for analyzing the Venice street network, each revealing different aspects of the city's unique urban structure:
- Network Acquisition & Infomap Community Detection (Python) - Downloads Venice street network data and identifies natural neighborhoods using information-theoretic community detection.
- Girvan-Newman Community Detection (Python) - Alternative hierarchical community detection method for comparative analysis.
- Memory-Biased Random Walk & Spectral Dimension Analysis (MATLAB) - Characterizes the network's fractal properties and geometric scaling behavior.
- Script 1 (
venice_network_infomap.py) - Network download from OpenStreetMap and Infomap community detection with visualizations - Script 2 (
venice_girvan_newman.py) - Girvan-Newman algorithm implementation (computationally intensive, currently commented out) - Script 3 (
MBRW_SPECTRAL_DIMENSION_DEMO.m) - MATLAB implementation of MBRW with spectral dimension computation - Data Processing - Automated data acquisition, preprocessing, and format conversion
- Visualizations - Publication-quality figures showing community structure, hierarchies, and spectral properties
- Analysis Results - CSV files with community assignments, network statistics, and spectral measurements
- Requirements & Installation
- Method 1: Network Download & Infomap Community Detection
- Method 2: Girvan-Newman Community Detection
- Method 3: MBRW Spectral Dimension Analysis
- Results Interpretation
- References
pandas
osmnx
networkx
numpy
scipy
infomap
matplotlibInstallation:
pip install pandas osmnx networkx numpy scipy infomap matplotlib- MATLAB R2018b or later
- Statistics and Machine Learning Toolbox
- Boost C++ Libraries (for MBRW compilation)
- C++ compiler (g++ for Linux/Mac, MinGW or MSVC for Windows)
- Infomap executable (optional for Method 3, enhances MATLAB performance)
- Windows: Download Infomap-win64.zip
- Linux/Mac: Installation guide
-
Install Boost C++ Libraries:
- Download from boost.org
- Extract to a known location (e.g.,
C:\boost_1_89_0) - Update the
boost_pathvariable in the MATLAB script
-
Set up C++ compiler:
% In MATLAB, verify compiler setup mex -setup C++
-
Install Infomap (optional):
- Download the appropriate binary for your system
- Place in MATLAB working directory or add to system PATH
This method downloads the Venice street network from OpenStreetMap and applies the Infomap algorithm to detect natural communities within the urban structure. The analysis identifies distinct neighborhoods based on how streets and intersections are connected.
OpenStreetMap (OSM) is a collaborative project providing free geographic data worldwide. The osmnx library provides a Python interface to:
- Query OSM databases by place name or bounding box
- Download street network data with geographic coordinates
- Convert raw data into graph structures suitable for analysis
- Unique canal-constrained geography
- Well-defined historical districts (sestieri)
- Complex multi-island structure
- Ideal test case for spatial network analysis
Infomap is a community detection algorithm based on information theory and random walks. It works by:
- Simulating a random walker traversing the network
- Finding the optimal way to compress the description of this walk using a two-level codebook
- Using community structure to achieve better compression - communities are regions where the walker spends more time
Key principle: If you can describe the walker's path more efficiently by giving communities their own "local" names, then those communities are meaningful structures.
Advantages:
- Information-theoretic foundation (principled approach)
- Fast: O(E log E) complexity
- Naturally handles weighted and directed networks
- Particularly effective for spatial networks
place_name = "Venezia-Murano-Burano, Veneto, Italy"
graph = ox.graph_from_place(place_name, network_type="all")Purpose: Acquire the complete street network data for Venice from OpenStreetMap.
Parameters:
place_name: Includes Venice main islands plus Murano and Buranonetwork_type="all": Downloads all types of roads (streets, paths, walkways, bridges)
What gets downloaded:
- Nodes: Intersections and dead-ends with latitude/longitude coordinates
- Edges: Street segments connecting nodes, with attributes like street name, length, type
- Metadata: Additional information about one-way streets, surface types, etc.
Output:
- Directed graph object in memory
- Saved as
venice.graphmlfor reuse and portability
G = graph.to_undirected()
if not nx.is_connected(G):
largest_cc = max(nx.connected_components(G), key=len)
G = G.subgraph(largest_cc).copy()Purpose: Prepare the graph for community detection analysis.
Operations:
-
Convert to undirected:
- Original OSM data includes one-way streets (directed edges)
- Community detection works better on undirected graphs
- Treats all connections as bidirectional relationships
-
Extract largest connected component:
- Venice has some small isolated islands or disconnected paths
- Community detection requires a connected graph
- Largest component represents the main Venice network
- Typically retains >95% of original nodes
Why this matters: Disconnected components would be trivially identified as separate communities, which doesn't provide useful insights. We want to find structure within the connected network.
im = infomap.Infomap("--two-level --markov-time 50.0 --seed 42")Parameters Explained:
-
--two-level: Restricts algorithm to find flat partition (network → communities)- Alternative: hierarchical (network → super-communities → communities)
- Two-level is simpler to interpret and visualize
-
--markov-time 50.0: Resolution parameter controlling community size- Higher values (100+) → Larger, fewer communities
- Lower values (20-40) → Smaller, more communities
- Value of 50 chosen to balance granularity with interpretability
- Roughly corresponds to "how far can you walk before leaving the community"
-
--seed 42: Random number generator seed for reproducibility- Algorithm has stochastic elements
- Same seed = same results every time
- Critical for scientific reproducibility
node_to_int = {node: i for i, node in enumerate(node_list)}
int_to_node = {i: node for node, i in node_to_int.items()}Purpose: Convert OSM node IDs to sequential integers required by Infomap.
Why necessary:
- OSM uses large integers (e.g., 123456789) as node IDs
- Infomap expects sequential IDs (0, 1, 2, 3, ...)
- Mapping ensures compatibility and memory efficiency
Process:
- Create forward mapping: OSM_ID → Sequential_ID
- Create reverse mapping: Sequential_ID → OSM_ID
- Use forward mapping to add edges to Infomap
- Use reverse mapping to interpret results
for edge in G.edges():
source_int = node_to_int[edge[0]]
target_int = node_to_int[edge[1]]
im.add_link(source_int, target_int)
im.run()Purpose: Run the Infomap algorithm on the Venice network.
Process:
- Add all edges to Infomap instance using integer IDs
- Execute algorithm (typically completes in seconds)
- Extract results from the tree structure
Output metrics:
- Number of communities: How many distinct regions were found
- Codelength: Information-theoretic measure of compression quality (lower is better)
- Module assignments: Which community each node belongs to
infomap_communities = {
int_to_node[node_int]: module
for node_int, module in infomap_communities_int.items()
}
infomap_modularity = nx.algorithms.community.modularity(G, infomap_comm_list)Purpose: Convert results back to original node IDs and evaluate quality.
Modularity:
- Standard metric for community quality (range: -1 to 1)
- Compares actual community structure to random expectation
- Q > 0.3: Strong community structure
- Q > 0.5: Very strong community structure
- Q < 0.2: Weak or no community structure
Community statistics calculated:
- Number of communities
- Size of each community (number of nodes)
- Geographic bounds (longitude/latitude ranges)
- Network density within each community
This figure shows the raw street network of Venice, as obtained from OpenStreetMap.
- Nodes represent street intersections.
- Edges represent navigable paths connecting these intersections.
- The network captures the overall connectivity and layout of Venice’s streets, including canals and pedestrian routes.
- This visualization provides the foundation for community detection analysis, as it allows us to identify clusters of intersections that form structurally coherent regions.
# All communities shown with different colorsPurpose: Overview of the complete community structure in geographic space.
Features:
- Each community rendered in unique color (up to 20 distinct colors)
- Node size: 30 pixels for visibility
- Edges: Gray with 15% transparency to show connectivity without cluttering
- Legend: Lists all communities with their sizes
- Title: Shows number of communities and modularity score
- Each color represents a distinct community detected by the Infomap algorithm.
- Nodes located in spatial proximity are frequently assigned to the same community.
- Communities reflect structural groupings within Venice's street network.
- Larger communities correspond to areas with higher internal connectivity.
- Each point represents a street intersection (graph node).
- Lines represent navigable paths connecting these intersections.
- Identified communities correspond to meaningful structural or functional zones.
- Modularity = 0.814 , indicating the clarity of community separation.
- The inset map on the right indicates the spatial position of the displayed region within the broader Venice network.
Four complementary views in a single figure:
- All nodes colored by community membership
- Geographic layout using true coordinates
- Comprehensive view of spatial organization
- Histogram showing frequency of different community sizes
- Reveals whether structure is balanced or hierarchical
- Helps identify dominant communities
- Background nodes in light gray for context
- Five largest communities in distinct bright colors
- Focus on major structural divisions
- X-axis: Number of nodes in community
- Y-axis: Geographic extent (computed from coordinate range)
- Each point is a community, colored by ID
- Reveals compact vs. dispersed communities
Shows all detected Infomap communities projected on the physical map of Venice.
Each community is consistently color-coded throughout the figure, spatial clusters represent regions of the street network that are internally well connected.
Displays the number of nodes (street intersections) in each community.
The distribution indicates whether Venice has a hierarchical community structure or a more balanced organization of regions.
Communities 1–6 are shown in physical space, with each point representing a street intersection.
Community sizes:
- C1: 1530 nodes
- C2: 1013 nodes
- C3: 927 nodes
- C4: 870 nodes
- C5: 785 nodes
- C6: 785 nodes
These communities occupy distinct geographic regions within the city.
Geographic spread is measured in degrees.
Each point represents one community, plotted by its physical size and number of nodes.
The strong positive correlation (R = 0.842, P = 3.544e-02) indicates that communities with larger geographic extent tend to contain more intersections.
Community 2 contains more intersections than expected given its geographic size, making it an outlier.
# Top 6 largest communities shown individuallyPurpose: Detailed examination of major communities in isolation.
For each of the 6 largest communities:
- Background: All nodes in light gray (context)
- Highlight: Community nodes in red, size 40 pixels
- Internal edges: Red, thick (1.0 width)
- Statistics: Size, number of internal edges
- Geographic context: Community position relative to whole network
Interpretation:
- Shape: Linear (canal-following) vs. compact (island-centered)
- Density: Number of internal connections
- Position: Which part of Venice (north, south, main island, outer islands)
- Boundaries: How community interfaces with neighbors
Use cases:
- Identifying specific neighborhoods
- Understanding local connectivity patterns
- Finding natural entry/exit points
Two complementary views:
- Scatter plot with nodes colored by community ID
- Colorbar showing community assignments
- Clear visualization of boundaries
- Heatmap showing local node density
- Uses spatial KD-tree for efficient radius queries
- Warmer colors (red/orange) = denser street networks
- Cooler colors (yellow) = sparser areas
Interpretation:
- Compare community boundaries with density patterns
- High-density areas often form community cores
- Low-density regions may be boundaries or parks
- Reveals whether communities correspond to dense vs. sparse regions
Technical note: Density computed using radius = 0.001 (coordinate units), counting neighbors within this distance.
Format: GraphML (Graph Markup Language) - XML-based graph format
Contents:
- All nodes with their OSM IDs and coordinates
- All edges with their connections
- Metadata attributes (street names, types, etc.)
Usage:
- Can be loaded into Gephi for additional visualization
- Reusable for other analyses without re-downloading
- Portable across different graph libraries
Format: CSV with node-level community assignments
| Column | Description |
|---|---|
node_id |
Original OSM node identifier |
longitude |
East-West coordinate |
latitude |
North-South coordinate |
infomap_community |
Assigned community ID (0, 1, 2, ...) |
Usage:
- Import into GIS software (QGIS, ArcGIS)
- Further statistical analysis
- Correlation with demographic or economic data
- Validation against official boundaries
Example rows:
node_id,longitude,latitude,infomap_community
260616588,12.3377289,45.4371711,0
260616589,12.3378123,45.4372456,0
260616590,12.3375678,45.4369234,1
Format: CSV with community-level aggregated statistics
| Column | Description |
|---|---|
community_id |
Unique community identifier |
size |
Number of nodes (intersections) |
num_edges |
Number of internal connections |
density |
Network density (0 to 1) |
center_lon |
Geographic center longitude |
center_lat |
Geographic center latitude |
lon_min, lon_max |
Longitude bounding box |
lat_min, lat_max |
Latitude bounding box |
Computed metrics:
-
Density = (actual edges) / (possible edges) = E / [N(N-1)/2]
- Density = 1: Fully connected (every node connected to every other)
- Density = 0: No internal connections
- Typical range: 0.01 - 0.15 for spatial networks
-
Geographic center: Mean of all node coordinates
-
Bounding box: Smallest rectangle containing all community nodes
Usage:
- Quantitative comparison between communities
- Statistical analysis of community characteristics
- Identifying outliers (unusually large, dense, or dispersed)
The script prints detailed statistics for the top 5 largest communities:
Community 0 (Rank #1):
Size: 2847 nodes
Geographic center: (12.338456, 45.437234)
Geographic bounds:
Longitude: [12.315678, 12.356789]
Latitude: [45.425678, 45.448901]
Network statistics:
Edges: 3456
Density: 0.0856
Avg clustering: 0.2341
Interpretation:
- Size: Larger communities may represent main islands or districts
- Center: Can be mapped to actual Venice locations
- Bounds: Geographic extent of the community
- Edges: More edges = better internal connectivity
- Density: Higher = more connected (grid-like vs. tree-like)
- Clustering: Local connectivity measure (0 to 1)
This method provides an alternative hierarchical approach to community detection. Unlike Infomap's information-theoretic method, Girvan-Newman uses edge betweenness to progressively divide the network, providing multi-scale community structure analysis.
(Currently commented out due to computational intensity, but included for comparative analysis).
The Girvan-Newman algorithm is a divisive hierarchical clustering method that works by:
- Computing edge betweenness: For each edge, count how many shortest paths between all node pairs pass through it
- Removing highest betweenness edge: The edge most "between" communities is likely a bridge
- Recalculating betweenness: After removal, recompute for remaining edges
- Iterating: Repeat until network is fully decomposed
- Creating dendrogram: Track the sequence of divisions
Key insight: Edges connecting different communities will have high betweenness because they're on many shortest paths. Removing them reveals community structure.
Unlike Infomap's flat partition, Girvan-Newman produces a hierarchy:
Level 0: 2 communities (first split)
Level 1: 3+ communities (subsequent splits)
Level 2: More communities (process continues)
...
Level N: Eventually isolated nodes
This allows exploring community structure at multiple resolutions.
| Aspect | Infomap | Girvan-Newman |
|---|---|---|
| Approach | Information-theoretic (random walks, compression) | Structural (edge betweenness, shortest paths) |
| Output | Single flat partition | Hierarchical dendrogram |
| Speed | Fast: O(E log E) | Slow: O(m²n) where m=edges, n=nodes |
| Memory | Modest: O(N + E) | High: O(N + E) storage, O(N²) operations |
| Best for | Large networks, single resolution | Smaller networks, multi-scale analysis |
| Resolution control | Markov time parameter | Choose level in hierarchy |
| Deterministic | Stochastic by default, deterministic with seed | Deterministic (same input = same output) |
Computational considerations for Venice:
- Venice network (largest component): ~14,000-16,000 nodes, ~18,000-20,000 edges
- Girvan-Newman: Estimated 10-30 minutes for 15 levels (hardware dependent)
- Infomap completes same analysis in ~5 seconds
- Trade-off: Girvan-Newman provides hierarchical insights not available from Infomap
from networkx.algorithms.community import girvan_newman
gn_iterator = girvan_newman(G)
max_levels = 15 # Limit iterations for computational efficiencyPurpose: Generate community structures at multiple scales.
Process:
girvan_newman(G)returns an iterator (lazy evaluation)- Each iteration removes one edge and recalculates
max_levels = 15limits to first 15 divisions- Full decomposition (to N communities) would be impractically slow
for i, communities in enumerate(gn_iterator):
if i >= max_levels:
break
comm_list = [set(c) for c in communities]
modularity = nx.algorithms.community.modularity(G, comm_list)
gn_communities_levels.append({
'level': i,
'num_communities': len(comm_list),
'communities': comm_list,
'modularity': modularity
})Purpose: Evaluate quality of partitions at each hierarchical level.
Why track modularity:
- Not all levels in the hierarchy are equally meaningful
- Modularity peaks at the "natural" scale of community structure
- Highest modularity level is typically chosen as the best partition
Output: List of partitions with their modularity scores, allowing selection of optimal resolution.
best_level = max(gn_communities_levels, key=lambda x: x['modularity'])
gn_comm_list = best_level['communities']
gn_modularity = best_level['modularity']Purpose: Automatically select the hierarchical level with strongest community structure.
Alternative strategies:
- Choose level with specific number of communities (e.g., 6 for Venice sestieri)
- Use domain knowledge about expected structure
- Visualize entire hierarchy and manually select
- File:
venice_girvan_newman_communities.png
Figure 1: Girvan-Newman Community Detection - Complete Venice Network
What this visualization shows:
This map displays the complete Venice street network with nodes colored according to their community membership as determined by the Girvan-Newman algorithm. Each color represents a distinct community, with colors assigned by size rank (largest community gets the first color, second-largest gets the second color, etc.).
Key elements:
- Network structure: Gray edges show the complete street connectivity of Venice
- Community nodes: Colored points represent street intersections, with each color indicating community membership
- Legend: Shows the top 10 largest communities with their node counts (e.g., "#1: 3,247 nodes")
- Title information: Displays the total number of communities detected, the hierarchical level selected, and the modularity score (typically 0.3-0.5)
- Explanatory text box: Upper-left corner provides algorithm details and quality metrics
What to look for:
- Geographic coherence: Do communities form spatially contiguous regions?
- Size distribution: Is there one dominant community or several comparable ones?
- Spatial patterns: Do communities align with known Venice districts (sestieri) or waterways?
- Boundary regions: Where do different communities meet? These are typically bridges or major thoroughfares.
- File:
venice_girvan_newman_detailed_analysis.png
Figure 2: Girvan-Newman Detailed Analysis - Four-Panel Comprehensive View
What this visualization shows:
This four-panel figure provides a comprehensive analytical view of the Girvan-Newman results, combining spatial, statistical, and hierarchical perspectives.
Panel-by-panel breakdown:
Top-Left: Community Spatial Distribution
- Shows the same community map as Figure 1, but with the full Venice street network visible as background context
- Demonstrates how communities are distributed across Venice's geographic space
- Helps identify whether communities correspond to islands, neighborhoods, or other spatial features
Top-Right: Community Size Distribution (Bar Chart)
- X-axis: Community rank (1 = largest, 2 = second-largest, etc.)
- Y-axis: Number of nodes in each community
- Shows the inequality in community sizes
- Typically reveals a power-law-like distribution with a few large communities and many small ones
- Helps understand if the network has a dominant "core" community or balanced structure
Bottom-Left: Modularity vs Hierarchical Level
- X-axis: Hierarchical level (0 to 14 in our implementation)
- Y-axis: Modularity score (quality metric, range −0.5 to 1.0)
- Green line: Shows how modularity changes as more edges are removed
- Red dashed line: Marks the level with maximum modularity (the "best" partition)
- Reveals the optimal level for community detection
- Typically shows modularity rising, peaking, then declining as the network becomes over-fragmented
Bottom-Right: Number of Communities vs Level
- X-axis: Hierarchical level (0 to 14)
- Y-axis: Number of communities detected
- Orange line: Shows how the network progressively divides
- Red dashed line: Marks the optimal level (same as bottom-left)
- Demonstrates the hierarchical nature of the algorithm
- Generally increases monotonically, though the rate may vary
Key insights from this figure:
- Optimal resolution: The level where modularity peaks indicates the "natural" community structure
- Hierarchical structure: Shows whether Venice has clear multi-scale organization
- Size inequality: Reveals whether the network is dominated by one large component or has balanced communities
- Quality validation: High modularity (>0.3) indicates strong community structure
- File:
venice_girvan_newman_individual_communities.png
Figure 3: Top Six Largest Communities - Individual Detailed Views
What this visualization shows:
This six-panel figure zooms into the largest six communities individually, showing each one in detail against the backdrop of the complete Venice street network.
Panel structure:
- Each subplot shows one community (ranked #1 through #6 by size)
- Gray lines: Complete Venice street network (for geographic context)
- Colored nodes: Members of the highlighted community
- Colored edges: Internal connections within the community
Information displayed for each community:
- Title: Community ID, size rank, and node count (e.g., "Community 5 (Rank #1): 3,247 nodes")
- Edge count: Number of internal edges showing connectivity
- Geographic extent: Full Venice map maintained as reference
- Spatial coherence: Whether the community forms a contiguous region
What to analyze:
For individual communities:
- Geographic coherence: Does the community occupy a single contiguous region or multiple disconnected areas?
- Internal connectivity: Dense edge networks suggest well-connected neighborhoods
- Spatial extent: Some communities may be geographically compact, others sprawling
- Boundary definition: Clear separation from other communities or fuzzy boundaries?
Across all six panels:
- Spatial distribution: Do large communities occupy different parts of Venice (e.g., main island vs. outer islands)?
- Size-geography relationship: Are larger communities more sprawling or just more densely connected?
- Structural differences: Do some communities show grid-like patterns while others are more organic?
- Real-world correspondence: Do communities align with known Venice features like:
- Historic sestieri (San Marco, Castello, Cannaregio, etc.)
- Major islands (Murano, Burano)
- Main thoroughfares or canal systems
Typical observations for Venice:
- Largest community (Rank #1): Often covers the main island's central core
- Second/third communities: May correspond to major sestieri or outer islands
- Smaller communities: Often represent peripheral areas, bridges, or isolated island groups
- Geographic logic: Well-detected communities should make intuitive sense given Venice's canal-separated structure
This detailed view allows validation of the algorithm's results against real-world knowledge of Venice's urban geography and helps identify whether detected communities have meaningful interpretations beyond pure network statistics.
def analyze_communities(communities, name):
for i, comm in enumerate(sorted(communities, key=len, reverse=True)[:5]):
# Print statistics for top 5 communitiesPurpose: Detailed statistics for largest communities.
For each of top 10 communities:
- Size (number of nodes)
- Geographic center (longitude, latitude)
- Geographic bounds (bounding box)
- Geographic spread (spatial extent)
- Network statistics (edges, density, clustering coefficient)
Use cases:
- Understanding community structure
- Validating geographic coherence
- Comparing with expected Venice districts (sestieri)
Node-level community assignments
| Column | Description |
|---|---|
node_id |
OSM node identifier |
longitude |
Geographic coordinate (X) |
latitude |
Geographic coordinate (Y) |
girvan_newman_community |
Community ID (0 to N-1) |
Community-level statistics
| Column | Description |
|---|---|
community_id |
Community identifier |
size |
Number of nodes in community |
num_edges |
Internal edges within community |
density |
Internal edge density [0,1] |
center_lon, center_lat |
Geographic centroid |
lon_min, lon_max |
Longitude bounds |
lat_min, lat_max |
Latitude bounds |
Hierarchical level analysis
| Column | Description |
|---|---|
level |
Hierarchical level (0 to max_levels-1) |
num_communities |
Communities at this level |
modularity |
Modularity score [−0.5, 1] |
Usage:
- Track how modularity changes across hierarchy
- Identify optimal number of communities
- Visualize hierarchical decomposition process
Use if:
- Working with smaller networks (< 5,000 nodes preferred)
- Need hierarchical community structure
- Want to explore multi-scale organization
- Have computational resources (multi-core CPU, time)
- Interested in method comparison/validation
Alternative (use Infomap) if:
- Working with large networks (> 20,000 nodes)
- Only need single-resolution partition
- Limited computational time
- Primary interest is fast analysis
For Venice: Both methods are feasible, Girvan-Newman provides additional hierarchical insights at the cost of longer runtime.
Computational estimates for Venice:
- Full 15 levels: 30-120 minutes (depends on hardware)
- Single level: 2-15 minutes
- Recommendation: Run on computing cluster
To run Girvan-Newman analysis:
- Locate the commented section in the Python script
- Uncomment the entire Girvan-Newman block (remove ''' at the first and last).
- Adjust parameters if needed:
max_levels = 15 # Reduce to 2-5 for faster execution
- Run the script and monitor progress
- Compare results with Infomap
Testing recommendation: First test on a smaller network or subgraph to estimate runtime.
This MATLAB-based method characterizes the geometric and fractal properties of the Venice street network using Memory-Biased Random Walks (MBRW). While Methods 1 and 2 identify what the communities are, Method 3 reveals how the network scales across different size ranges - from individual streets to the entire city.
MBRW is an enhanced random walk process that explores network structure more effectively than simple random walks:
Standard Random Walk:
- At each step, randomly choose a neighbor
- Problem: Frequently backtracks, doesn't explore efficiently
- Like a drunk person wandering aimlessly
Memory-Biased Random Walk:
- Remembers recently visited nodes (memory)
- Heavily penalizes returning to recent locations (bias)
- Explores systematically like an intentional navigator
Mathematical formulation:
- Memory (m): Remember last m visited nodes
- Bias (b): Probability of revisiting recent nodes reduced by factor b
- Transitions: P(i→j) ∝ 1/b if j was visited recently, ∝ 1 otherwise
The spectral dimension (Dq) generalizes the concept of dimensionality for complex networks:
Intuition:
- D = 1: Network behaves like a line (1-dimensional)
- D = 2: Network behaves like a plane (2-dimensional)
- D = 3: Network behaves like a volume (3-dimensional)
- 1 < D < 2: Fractal structure (incomplete space-filling)
Technical definition: Dq measures how the "mass" (number of visited nodes) scales with "radius" (walk length):
Mass(L) ∝ L^(Dq)
where L is the walk length and q is the moment order.
Multiple dimensions (Dq spectrum):
- q → -∞: Emphasizes rarely visited regions
- q = 0: Counts distinct visited nodes
- q = 1: Entropy-based (information dimension)
- q = 2: Variance-based (correlation dimension)
- q → +∞: Emphasizes frequently visited regions
Multifractality: If Dq varies with q, the network is multifractal - different parts have different scaling properties. This indicates hierarchical organization.
Venice's unique structure creates interesting geometric properties:
- Canal constraints: Limit connectivity, creating fractal-like patterns
- Bridge bottlenecks: Create hierarchical scales (local islands, bridge connections, city-wide)
- Historical growth: Organic development leads to self-similar patterns
- Spatial embedding: Unlike online networks, Venice is constrained to 2D space
Expected findings:
- Dq ≈ 1.5-2.2: Between linear and fully planar
- ΔDq > 0: Multifractal behavior from heterogeneous structure
- Correlations: Spectral dimension changes may align with community boundaries
venice_graphml_file = 'networks/venice.graphml';Purpose: Load the Venice network exported from the Python script.
Streaming parser implementation:
fid = fopen(venice_graphml_file, 'r', 'n', 'UTF-8');
while ~feof(fid)
line = fgetl(fid);
% Parse nodes and edges line-by-line
endWhy streaming?
- GraphML files can be 50-200 MB for large networks
- Loading entire file into memory inefficient
- Line-by-line processing keeps memory footprint low
- Enables processing networks larger than available RAM
What gets extracted:
- Node IDs: Unique identifiers from GraphML
- Edge list: Connections between nodes
- Coordinates: LINESTRING geometries from edges
- First coordinate → source node position
- Last coordinate → target node position
- Geographic data: Longitude and latitude for visualization
Node ID remapping:
node_to_int = containers.Map(node_ids, 1:node_count);OSM IDs are large random integers (e.g., 5698234712). MATLAB graphs work better with sequential IDs (1, 2, 3, ...). Remapping saves memory and improves performance.
while true
node_degrees = degree(G_2core);
low_degree_nodes = find(node_degrees < 2);
if isempty(low_degree_nodes)
break;
end
G_2core = rmnode(G_2core, low_degree_nodes);
endPurpose: Extract the network "skeleton" by removing dead-end streets.
What is a k-core?
- A k-core is the maximal subgraph where every node has degree ≥ k
- 2-core means every node has at least 2 connections
- Removes "whiskers" (degree-1 nodes) and isolated edges
Why this is necessary for MBRW:
- Dead ends bias results: Random walker gets trapped in cul-de-sacs
- Artificial features: Many dead ends are data artifacts (address points, parking lots)
- Core structure: 2-core represents the true interconnected street network
- Computational efficiency: Fewer nodes → faster MBRW simulation
Iterative process:
- Removing degree-1 nodes can create new degree-1 nodes
- Must iterate until no more removals possible
- Typically converges in 2-5 iterations
Statistics printed:
Iteration 1: Removed 3245 nodes with degree < 2
Iteration 2: Removed 456 nodes with degree < 2
Iteration 3: Removed 23 nodes with degree < 2
2-core graph has 28543 nodes and 41267 edges
cc_bins = conncomp(G_2core);
[largest_size, largest_cc_idx] = max(component_sizes);
G = subgraph(G_2core, largest_cc_nodes);Purpose: Ensure analysis is performed on a single connected network.
Why necessary:
- Venice includes small disconnected islands (Torcello, Sant'Erasmo, etc.)
- MBRW cannot traverse between disconnected components
- Analysis on disconnected graphs gives misleading results
- Largest component typically contains >98% of nodes
Component analysis:
component_sizes = zeros(num_components, 1);
for i = 1:num_components
component_sizes(i) = sum(cc_bins == i);
endPrints all component sizes to identify if any significant portions are being excluded.
Output: Clean, connected network saved as venice_2core_lcc.txt in edge list format:
1 2
1 3
2 4
...
This format is required by the MBRW C++ executable.
markov_times = [50, 70]; % add more/less if neccessary...
infomap_command = sprintf('"%s" "%s" infomap_output --undirdir --two-level --markov-time %d --num-trials 10', ...
infomap_exe, infomap_input_file, mt);Purpose: Detect communities in MATLAB environment for integration with MBRW analysis.
Why run Infomap in MATLAB?
- Independent validation of Python results
- Enables correlation analysis with spectral dimensions
- Allows integrated visualizations in single environment
- Tests robustness across different implementations
Export to Pajek format:
fprintf(fid, '*Vertices %d\n', numnodes(G));
fprintf(fid, '*Edges\n');Pajek (.net) format is Infomap's native input format, simpler than GraphML.
Parameter sweep: Tests multiple Markov time values to find configuration producing 6 communities (Venice's traditional sestieri):
- MT = 50: Typically 8-12 communities
- MT = 70: Typically 5-8 communities
- Tries 10 independent runs (
--num-trials 10) for robustness
Version compatibility handling:
% Try different flags for Infomap compatibility
[status, cmdout] = system(infomap_command);
if status ~= 0
% Try alternative flag
infomap_command = sprintf('... --undirdir ...');
endDifferent Infomap versions use different command-line flags. Code tries multiple variations.
Fallback mechanism: If Infomap executable fails, falls back to native MATLAB implementation:
[infomap_communities, infomap_modularity] = louvain_communities_matlab(G, 6);Uses k-means clustering on node degrees as simple alternative.
h1 = plot(G, 'XData', plot_x, 'YData', plot_y, ...
'EdgeColor', [0.7 0.7 0.7], 'EdgeAlpha', 0.3, ...
'MarkerSize', 6, 'LineWidth', 0.5);
h1.NodeCData = infomap_communities;Purpose: Create publication-quality visualizations using true geographic coordinates.
Coordinate mapping challenge: After 2-core extraction and LCC selection, node indices change. Need to carefully map:
- Original GraphML node IDs → Coordinates
- Filtered graph nodes → Original IDs
- Coordinates → Plot positions
Multi-panel figure:
Panel 1: Network with Communities
- Geographic layout (longitude/latitude)
- Nodes colored by community
- Semi-transparent edges for context
Panel 2: Size Distribution
- Histogram of community sizes
- Shows whether communities are balanced
Panel 3: Ranked Sizes
- Bar chart sorted by size
- Identifies hierarchy
Panel 4: Parameter Sweep
- Dual y-axis showing communities vs. modularity
- Helps choose optimal Markov time
Panel 5: Statistics Table
- Text listing all communities
- Sizes and modularity metrics
Figure 7: MATLAB multi-panel community detection analysis
figure('Position', [100 100 1200 900], 'Name', 'Venice Communities - Geographic View');
h = plot(G, 'XData', plot_x, 'YData', plot_y, ...);
colormap(hsv(num_infomap_communities));Purpose: High-quality geographic visualization for presentations.
Features:
- True Venice coordinates: Accurate geographic layout
- HSV colormap: Maximum color distinction between communities
- High transparency edges: Emphasize nodes over connections
- Proper axis labels: Longitude and Latitude
- Colorbar: Shows community IDs
- High resolution: 300 DPI for publication quality
Coordinate validation:
valid_coords = ~isnan(plot_x) & ~isnan(plot_y);
coords_available = sum(valid_coords);Checks what percentage of nodes have valid coordinates. Requires >30% for geographic plot.
Subgraph for valid coordinates: If some nodes lack coordinates, creates subgraph with only nodes that have geographic data:
G_geo = subgraph(G, nodes_with_coords);This ensures clean visualization without missing data artifacts.
Figure 8: High-quality geographic community visualization
bias = 10000;
memory = 5;
rand_seed = 0;
log_num_steps = 23;Purpose: Configure the memory-biased random walk simulation.
Parameter explanations:
bias = 10000 (Very high exploration bias)
- Probability of returning to recently visited node reduced by factor 10,000
- Effect: Walker strongly prefers unexplored regions
- Why high: Venice is dense, need strong bias to avoid local trapping
- Lower values (100-1000): More local exploration
- Higher values (50000+): Almost never revisit
memory = 5 (Remember last 5 nodes)
- Walker tracks last 5 visited nodes
- Prevents immediate backtracking: can't return to last 5 nodes easily
- Simulates intelligent navigation
- Typical range: 3-7
- Too low (1-2): Excessive backtracking
- Too high (10+): Too restrictive, may get stuck
rand_seed = 0 (Reproducibility)
- Controls random number generator
- Same seed = identical results
- Critical for scientific reproducibility
- Change to test sensitivity
log_num_steps = 23 (Simulate up to 2²³ ≈ 8.4 million steps)
- Covers scales from 1 edge to full network diameter
- Need wide range to observe scaling behavior
- Venice diameter: ~100-200 hops
- Factor of 1000 safety margin ensures full coverage
- Each doubling explores one more scale
Orders (moment orders q):
finite_orders = -19:19;
orders = [-Inf finite_orders Inf];Computes spectral dimensions for q from -∞ to +∞:
- q = -19 to -1: Emphasize rarely visited nodes
- q = 0: Count distinct visited nodes
- q = 1 to 19: Emphasize frequently visited nodes
- q = -∞, +∞: Extreme emphasis
compile_command = sprintf('g++ -std=c++0x -I %s %s.cpp -o %s -O2', ...
boost_path, c_executable_name, c_executable_name);Purpose: Compile and run the MBRW simulation in C++ for computational efficiency.
Why C++ instead of MATLAB?
- Speed: C++ is 50-100× faster than MATLAB for this task
- Memory: More efficient data structures
- Boost library: Provides optimized graph algorithms
- Scale: Can handle millions of random walk steps
Compilation flags:
-std=c++0x: Use C++11 standard features-I %s: Include Boost library path-O2: Optimization level 2 (good speed, reasonable compile time)- Alternative:
-O3(maximum optimization, slower compilation)
Execution:
run_command = sprintf('%s%s -i %s -q %s -o %s -b %u -m %u -r %u -c %u', ...
dot_if_linux, c_executable_name, mbrw_able_edge_file_name, ...
order_file_name, segment_mass_log_multimean_file_name, ...
bias, memory, rand_seed, log_num_steps);Command-line arguments:
-i: Input edge list file-q: Orders file (q-values to compute)-o: Output file for results-b: Bias parameter-m: Memory parameter-r: Random seed-c: log₂(max steps)
Caching:
if ~exist(segment_mass_log_multimean_file_name, 'file')
% Compile and run
endOnly recompiles/reruns if output file doesn't exist. Saves time on repeated analyses.
Typical runtime:
- Venice (30K nodes): 10-30 minutes
- Progress printed to console
- Can be interrupted and resumed
log2_generalized_means = readmatrix(segment_mass_log_multimean_file_name);Purpose: Analyze how network exploration scales with walk length.
Data structure:
- Rows: Segment lengths (2⁰, 2¹, 2², ..., 2²³)
- Columns: Different q-values (-∞, -19, -18, ..., 0, ..., 18, 19, +∞)
- Values: log₂ of generalized mean segment mass
What is segment mass? For a walk of length L:
- Divide walk into segments of length L
- For each segment, count visited nodes
- Compute generalized mean across all segments:
- M_q = [mean(mass^q)]^(1/q)
Physical interpretation:
- M₀: Number of distinct nodes visited
- M₁: Exponential of entropy (information measure)
- M₂: Standard deviation of mass distribution
- M_∞: Maximum mass (most-visited node count)
Interpretation:
- Linear regions: Power-law scaling (fractal behavior)
- Slope: Related to spectral dimension
- Curvature: Indicates finite-size effects or crossover scales
- Separation between curves: Indicates multifractality
- Parallel curves → unifractal
- Diverging curves → multifractal
Dq = spectral_dimensions_v2(log2_generalized_means, 'length', num_nodes);Purpose: Extract spectral dimensions from segment mass data.
Algorithm:
-
Identify scaling region: Find where curves are approximately linear
- Usually middle range: segment length ≈ 16 to 2048
- Avoid small L (local effects) and large L (finite-size effects)
-
Compute local slopes:
- dM/dL in log-log space
- Use finite differences or robust regression
-
Extract Dq:
- Slope of log(mass) vs. log(length) equals Dq
- Different q-values give different dimensions
Function signature:
function Dq = spectral_dimensions_v2(log2_means, method, num_nodes)Parameters:
log2_means: Input data matrixmethod: 'length' (use length-based scaling) or 'mass' (use mass-based)num_nodes: Network size for normalization
Output:
Dq: Array of spectral dimensions, one per q-value- Length = number of q-values (41 in this case)
Key metrics computed:
mean_Dq = mean(finite_order_Dqs);
std_Dq = std(finite_order_Dqs);
deltaDq = max(finite_order_Dqs) - min(finite_order_Dqs);- Mean: Average spectral dimension
- Std: Variability across q
- ΔDq: Range (multifractal measure)
Purpose: Show raw MBRW results and scaling behavior.
Features:
- X-axis: log₂(segment length) from 0 to 23
- Y-axis: log₂(q-mean segment mass)
- Multiple curves: One per q-value, color-coded
- Legend: All q-values including ±∞
- Grid: For reading values
Color scheme:
line_colors = jet(num_orders);- Blue: Negative q (rare regions)
- Green/Yellow: q near 0
- Red: Positive q (common regions)
What to look for:
- Overall trend: Should increase with length (more steps → more coverage)
- Linearity: Straight lines indicate power-law (fractal) scaling
- Curvature at large L: Finite-size effects (network has finite size)
- Spacing between curves: Larger spacing → stronger multifractality
Figure 10: MBRW segment mass log-log plot
Purpose: Visualize how spectral dimension varies with moment order.
Features:
plot(finite_orders, finite_order_Dqs, 'b-o', 'LineWidth', 2, 'MarkerSize', 6)- X-axis: q from -19 to +19
- Y-axis: Spectral dimension Dq
- Blue line with circles: Easy to read
- Grid: Precise value reading
- Text box: Mean ± std deviation
Typical patterns:
Decreasing Dq with increasing q:
q=-19: D=2.1
q=0: D=1.9
q=19: D=1.7
Indicates heterogeneous structure - dense regions (high q) have lower dimension.
Flat Dq profile:
All q: D≈1.9 ± 0.05
Indicates uniform, homogeneous network structure.
Sharp transitions:
q<0: D=2.0
q>0: D=1.5
May indicate distinct structural regimes or phase transitions.
Figure 11: Spectral dimension spectrum (Dq vs q)
Interpretation for Venice:
- Expected mean: 1.7-2.1 (constrained planar network)
- Expected ΔDq: 0.1-0.3 (moderate multifractality from canals)
- Shape: Likely decreasing (denser core, sparser periphery)
Creates comprehensive figure showing:
- Community structure (left)
- Spectral dimension profile (right)
Enables visual correlation between:
- Community boundaries
- Changes in Dq
- Geographic features
This integrated view reveals whether community structure corresponds to geometric changes in network organization.
Format: Tab-delimited edge list
1 2
1 3
2 4
3 5
...
Properties:
- Sequential node IDs (1 to N)
- One edge per line
- Undirected (each edge listed once)
- No weights or attributes
Usage:
- Input for MBRW C++ executable
- Portable format readable by many tools
- Can be loaded into NetworkX, igraph, etc.
Size: ~1-2 MB for Venice network
Format: CSV with two columns
| NodeID | Community |
|---|---|
| 1 | 0 |
| 2 | 0 |
| 3 | 1 |
| ... | ... |
Important: NodeIDs are sequential (1 to N) and correspond to the 2-core LCC network, NOT original OSM IDs.
Usage:
- Map spectral properties to communities
- Statistical analysis of community characteristics
- Correlation studies
Format: CSV matrix (no headers)
Filename encoding:
b_10000: bias parameterm_5: memory parameterr_0: random seedc_23: log₂(max_steps)
Contents:
- Rows (24): Segment lengths 2⁰ to 2²³
- Columns (41): q-values from -∞ to +∞
- Values: log₂(generalized mean segment mass)
Example:
Row 0 (L=1): [-2.3, -1.8, ..., 0.5, 1.2]
Row 10 (L=1024): [8.4, 9.1, ..., 12.3, 14.7]
Row 23 (L=8M): [15.2, 15.8, ..., 18.9, 21.3]
Usage:
- Raw data for spectral dimension computation
- Custom analysis with different methods
- Validation and reproducibility
- Comparison across networks
Size: ~50-100 KB (small, compressed data)
Format: Plain text, one value per line
-Inf
-19
-18
...
0
...
18
19
Inf
Purpose: Specifies which q-values to compute.
Usage: Input file for MBRW C++ executable.
High-resolution (300 DPI) figures:
-
venice_infomap_communities.png- Multi-panel community analysis
- Size distribution
- Parameter sweep
- Statistics
-
venice_communities_geographic.png- Geographic layout with communities
- High-quality for presentations
-
MBRW analysis figures (not auto-saved by default)
- Segment mass curves
- Spectral dimension spectrum
- Use MATLAB's export features to save
Saving additional figures:
print('figure_name', '-dpng', '-r300');Physical meaning:
| Dq Value | Interpretation |
|---|---|
| Dq ≈ 1.0 | Linear structure (chain-like) |
| Dq ≈ 1.5 | Fractal between 1D and 2D |
| Dq ≈ 2.0 | Planar structure (fills a plane) |
| Dq ≈ 2.5 | Between 2D and 3D |
| Dq ≈ 3.0 | Volumetric (space-filling) |
Expected ranges for different networks:
| Network Type | Typical Dq | ΔDq | Example |
|---|---|---|---|
| Linear chain | 1.0 | 0.0 | Highway system |
| Tree | 1.0-1.3 | 0.1-0.2 | River networks |
| Fractal | 1.3-1.8 | 0.2-0.4 | Coastlines, DLA |
| Planar grid | 2.0 | 0.0 | Manhattan streets |
| Spatial network | 1.5-2.2 | 0.1-0.3 | City streets |
| Small-world | 2.5-4.0 | 0.3-0.6 | Social networks |
For Venice specifically:
- Expected Dq: 1.7-2.1
- Lower than pure grid (2.0) due to canal constraints
- Higher than tree (1.3) due to bridge connections and islands
- Expected ΔDq: 0.15-0.35
- Moderate multifractality from heterogeneous structure
- Denser main islands, sparser outer regions
ΔDq = max(Dq) - min(Dq) measures structural heterogeneity:
-
ΔDq ≈ 0 (unifractal):
- Homogeneous structure
- Similar density everywhere
- Regular grid-like organization
- Example: Manhattan street grid
-
ΔDq = 0.1-0.3 (weakly multifractal):
- Moderate heterogeneity
- Some dense and sparse regions
- Typical for organic city growth
- Example: Most European cities
-
ΔDq > 0.5 (strongly multifractal):
- Highly heterogeneous
- Clear hierarchy of scales
- Hub-and-spoke organization
- Example: Airport networks, Internet
Venice interpretation:
- ΔDq ~ 0.2: Moderate heterogeneity expected
- Sources:
- Dense historic center (San Marco)
- Medium density residential areas
- Sparse outer islands
- Canal-induced constraints
Integration of Methods 1-3:
-
Infomap communities show modular organization
- Identifies distinct neighborhoods
- Reveals structural divisions
- Based on connectivity patterns
-
Girvan-Newman shows hierarchical structure
- Multi-scale organization
- Nested communities
- Progressive decomposition
-
MBRW spectral dimensions show geometric scaling
- How structure changes across scales
- Fractal vs. regular organization
- Dimensional properties
Combined insights:
- Community boundaries may correspond to changes in Dq
- Large communities may have higher Dq (more planar)
- Small communities may have lower Dq (more linear)
- Multifractality reflects hierarchy seen in Girvan-Newman
Analysis workflow:
- Identify communities (Methods 1 & 2)
- Compute Dq for each community separately
- Compare spectral properties across communities
- Correlate with geographic features (canals, bridges)
Infomap --markov-time:
- Lower (20-40): More, smaller communities
- Medium (50-70): Balanced resolution
- Higher (100-200): Fewer, larger communities
Guidelines:
- For Venice: 50-70 works well
- For smaller networks: 30-50
- For very large networks: 70-100
Girvan-Newman max_levels:
- Controls computational time
- Each level ≈ 3-10 minutes for Venice
- 10 levels sufficient for most analyses
MBRW bias:
- Low (100-1000): More local exploration
- Use for: Small-scale structure
- Risk: Trapped in local regions
- Medium (5000-20000): Balanced
- Use for: General analysis
- Recommended: Start here
- High (50000+): Maximum exploration
- Use for: Large, complex networks
- Risk: May miss local structure
MBRW memory:
- Low (2-3): More backtracking
- Use for: Trees, sparse networks
- Medium (4-6): Balanced
- Use for: Most spatial networks
- Recommended: 5 for Venice
- High (7-10): Restricted exploration
- Use for: Dense grids
- Risk: May get stuck
log_num_steps:
- Must exceed network diameter by factor 100-1000
- Venice (~200 diameter): 23 (8.4M steps) good
- Smaller (< 50): 18-20 sufficient
- Larger (> 500): 24-25 needed
Testing parameters: Run with different values and check:
- Dq values reasonable (1-3 range)
- Curves smooth (not jagged)
- Clear scaling region visible
- Results stable across parameter variations
Python:
- Infomap:
--seed 42 - Girvan-Newman: Deterministic (no seed needed)
- NetworkX functions: Set
np.random.seed(42)
MATLAB:
- MBRW:
rand_seed = 0 - Infomap:
--seed 42in command - MATLAB random:
rng(0)
Best practices:
- Document all parameters in README
- Save parameter files with results
- Use version control for code
- Archive complete output files
Method 1 - Infomap (Python):
- Time: O(E log E) ≈ 5-30 seconds for Venice
- Memory: O(N + E) ≈ 500 MB
- Hardware: Any modern laptop
Method 2 - Girvan-Newman (Python):
- Time: O(m²n) to O(m²n²) ≈ 30-120 minutes for Venice
- Memory: O(N²) ≈ 3-8 GB
- Hardware: 8+ GB RAM recommended
Method 3 - MBRW (MATLAB):
- Time: O(T log T) where T = 2^log_num_steps
- Venice with log_num_steps=23: 10-60 minutes
- Compilation: ~1 minute (one-time)
- Memory: O(N × log_num_steps) ≈ 2-4 GB
- Hardware: 8+ GB RAM, C++ compiler required
Recommendations:
- Minimum: 8 GB RAM, dual-core 2.5+ GHz
- Recommended: 16 GB RAM, quad-core 3.0+ GHz
- Optimal: 32 GB RAM, 8-core CPU for parallel work
Speed optimization:
- Python: Use PyPy for faster execution
- MATLAB: Parallel Computing Toolbox for multiple networks
- C++: Increase optimization level to
-O3 - Hardware: SSD for faster file I/O
| Aspect | Infomap | Girvan-Newman | MBRW |
|---|---|---|---|
| What it reveals | Communities | Hierarchy | Geometry |
| Output | Flat partition | Dendrogram | Dimensions |
| Time (Venice) | 5-30 sec | 30-120 min | 10-60 min |
| Memory | 500 MB | 3-8 GB | 2-4 GB |
| Deterministic | No (use seed) | Yes | No (use seed) |
| Resolution | Markov time | Level in hierarchy | q-values |
| Best for | Fast analysis | Multi-scale | Fractal properties |
| Scalability | Excellent | Poor | Good |
| Implementation | Python | Python | MATLAB+C++ |
-
Rosvall, M., & Bergstrom, C. T. (2008). Maps of random walks on complex networks reveal community structure. Proceedings of the National Academy of Sciences, 105(4), 1118-1123.
- Original Infomap paper
-
Girvan, M., & Newman, M. E. J. (2002). Community structure in social and biological networks. Proceedings of the National Academy of Sciences, 99(12), 7821-7826.
- Original Girvan-Newman algorithm
-
Newman, M. E. J. (2006). Modularity and community structure in networks. Proceedings of the National Academy of Sciences, 103(23), 8577-8582.
- Modularity metric definition
-
Song, C., Havlin, S., & Makse, H. A. (2005). Self-similarity of complex networks. Nature, 433(7024), 392-395.
- Introduces box-counting for networks
-
Song, C., Havlin, S., & Makse, H. A. (2006). Origins of fractality in the growth of complex networks. Nature Physics, 2(4), 275-281.
- Fractal scaling in networks
-
Daqing, L., Kosmidis, K., Bunde, A., & Havlin, S. (2011). Dimension of spatially embedded networks. Nature Physics, 7(6), 481-484.
- Spatial network dimensions
-
Shanker, O. (2007). Defining dimension of a complex network. Modern Physics Letters B, 21(6), 321-326.
- Spectral dimension formalism
-
Boeing, G. (2017). OSMnx: New methods for acquiring, constructing, analyzing, and visualizing complex street networks. Computers, Environment and Urban Systems, 65, 126-139.
- OSMnx library for street networks
-
Porta, S., Crucitti, P., & Latora, V. (2006). The network analysis of urban streets: A primal approach. Environment and Planning B, 33(5), 705-725.
- Urban street network analysis
-
Batty, M., & Longley, P. (1994). Fractal Cities: A Geometry of Form and Function. Academic Press.
- Comprehensive book on urban fractals
- Infomap: https://www.mapequation.org/infomap/
- OSMnx: https://osmnx.readthedocs.io/
- NetworkX: https://networkx.org/
- Boost C++: https://www.boost.org/
- MATLAB Graph Functions: https://www.mathworks.com/help/matlab/graph-and-network-algorithms.html
This comprehensive analysis suite demonstrates three complementary approaches to understanding urban network structure:
Method 1 (Infomap) reveals functional organization - how Venice is divided into coherent neighborhoods based on connectivity. The information-theoretic foundation provides a principled, efficient method for identifying natural divisions.
Method 2 (Girvan-Newman) reveals hierarchical structure - how communities nest within larger super-communities, showing multi-scale organization. Though computationally expensive, it validates and extends Infomap results.
Method 3 (MBRW) reveals geometric properties - how Venice's structure scales from individual streets to the entire city. This captures the fractal and hierarchical nature of urban growth constrained by canals and geography.
Together, these methods answer:
- What are the parts? (Infomap)
- How are parts organized hierarchically? (Girvan-Newman)
- How do parts fit together across scales? (MBRW)
Venice provides an ideal test case because:
- Unique geography: Canal constraints create interesting topological features
- Historical structure: Traditional sestieri (6 districts) provide validation
- Multi-island system: Natural hierarchical organization
- Well-documented: Extensive historical and geographic data available
- Tourism interest: Results have practical applications for navigation and planning
The methods developed here can be applied to:
- Urban planning: Identify natural neighborhood boundaries for administrative purposes
- Emergency services: Optimize response routes based on community structure
- Tourism management: Design walking tours that respect natural districts
- Transportation: Plan public transit routes connecting communities efficiently
- Historical analysis: Compare detected communities with historical districts
- Infrastructure: Identify critical bridges and connections between communities
- Sustainability: Understand pedestrian flow patterns for traffic management
This repository demonstrates:
- Multi-method validation: Using three independent approaches for robustness
- Scale integration: Connecting local (communities) and global (spectral) properties
- Practical implementation: Complete, reproducible workflow from data to results
- Computational efficiency: Optimizations for large-scale network analysis
- Visualization best practices: Publication-quality figures with clear interpretation
Potential extensions and research directions:
-
Temporal analysis:
- Compare community structure across different historical periods
- Track how Venice's network evolved over centuries
- Analyze seasonal variations (tourist vs. non-tourist seasons)
-
Multi-layer networks:
- Integrate different transportation modes (pedestrian, water taxi, gondola)
- Create separate layers for different network types
- Study how communities differ across transportation modes
-
Validation studies:
- Quantitative comparison with official sestieri boundaries
- Ground-truth validation with local resident surveys
- Historical map analysis for temporal validation
-
Functional correlation:
- Overlay with socioeconomic data (property values, demographics)
- Correlate with points of interest (churches, museums, shops)
- Analyze tourist flow patterns and their relationship to structure
-
Additional algorithms:
- Louvain method for comparison
- Label Propagation for speed
- Spectral clustering for geometric approach
- Ensemble methods combining multiple algorithms
-
Parameter optimization:
- Systematic grid search for optimal Markov time
- Stability analysis across parameter ranges
- Information-theoretic criteria for parameter selection
-
Hierarchical Infomap:
- Enable multi-level hierarchy (currently two-level)
- Compare with Girvan-Newman hierarchies
- Develop methods for automatic scale selection
-
Community-specific spectral dimensions:
- Compute Dq separately for each detected community
- Compare geometric properties across neighborhoods
- Identify communities with anomalous scaling
-
Temporal MBRW:
- Analyze how spectral properties changed historically
- Correlate dimension changes with urban development
- Study impact of new bridges on network geometry
-
Advanced multifractal analysis:
- Full multifractal spectrum analysis
- Hölder exponent distribution
- Singularity spectrum computation
- Identify phase transitions in network structure
-
Comparative urban analysis:
- Apply same pipeline to other canal cities (Amsterdam, Bruges, Stockholm)
- Compare Venice with non-canal cities of similar size
- Identify universal vs. Venice-specific properties
- Build typology of urban network structures
-
Network motifs:
- Identify recurring structural patterns
- Correlate motifs with community boundaries
- Study relationship between motifs and spectral properties
-
Navigation optimization:
- Use community structure for hierarchical routing
- Design shortest-path algorithms respecting communities
- Optimize tourist routes to visit multiple districts efficiently
-
Critical infrastructure:
- Identify bridges critical for inter-community connectivity
- Analyze vulnerability to edge removal (bridge closure)
- Design resilient network modifications
-
Accessibility analysis:
- Combine with elevation data (Venice flooding)
- Identify communities most vulnerable to acqua alta
- Design emergency evacuation routes
-
Community-spectral correlation:
- Statistical analysis of Dq variation across communities
- Test hypothesis: community boundaries = dimension transitions
- Develop unified theory connecting topology and geometry
-
Machine learning:
- Train models to predict community membership from local features
- Use spectral properties as features for classification
- Develop automated community detection using learned features
-
Interactive visualization:
- Web-based interface for exploring results
- 3D visualization with building heights
- Time-slider for temporal analysis
- Overlay with satellite imagery
-
Analytical models:
- Develop theoretical predictions for Dq in canal-constrained networks
- Mathematical relationship between modularity and multifractality
- Optimal network growth models reproducing Venice-like structure
-
Scaling theory:
- Finite-size scaling analysis
- Crossover phenomena between scales
- Universal scaling laws for spatial networks
Problem: ImportError: No module named 'osmnx'
Solution: pip install osmnxProblem: Infomap segmentation fault
Solution: Update infomap to latest version
pip install --upgrade infomap
Problem: NetworkX version incompatibility
Solution: Ensure NetworkX >= 2.5
pip install --upgrade networkx
Problem: Visualization plots are blank
Solution: Check matplotlib backend
import matplotlib
matplotlib.use('TkAgg') # or 'Qt5Agg'
Problem: "Cannot open file: venice.graphml"
Solution: Run Python script first to generate the file
Or check file path in MATLAB script
Problem: Boost library not found during compilation
Solution: Update boost_path variable
boost_path = 'C:\boost_1_89_0'; % Adjust path
Problem: g++ compiler not found
Solution (Windows): Install MinGW-w64
Solution (Mac): Install Xcode command line tools
Solution (Linux): sudo apt-get install build-essential
Problem: Infomap executable fails with version error
Solution: Try different command flags
--undirdir or -u or --undirected
Or use fallback native MATLAB algorithm
Problem: Out of memory during MBRW
Solution: Reduce log_num_steps from 23 to 21
Or close other applications
Or use computer with more RAM
Problem: MBRW results look wrong (Dq > 5 or < 0)
Solution: Check bias and memory parameters
Ensure network is connected (check LCC extraction)
Verify input file format (tab-delimited)
Python slow on large networks:
- Use PyPy instead of CPython
- Enable NetworkX caching
- Reduce visualization node size
- Sample nodes for preview plots
MATLAB slow on large networks:
- Increase C++ optimization:
-O3 - Use parfor for multiple parameter sets
- Reduce number of q-values tested
- Use compiled MEX files instead of m-files
Disk space issues:
- Compress GraphML files after generation
- Delete intermediate Infomap output files
- Store only essential results
- Use binary formats instead of CSV
This is an academic project. Suggestions for improvements:
-
Additional visualizations
- Interactive plots using Plotly
- 3D network embeddings
- Animated temporal evolution
-
Code optimizations
- Parallel processing for multiple cities
- GPU acceleration for large networks
- More efficient data structures
-
Documentation improvements
- Video tutorials
- Jupyter notebooks with examples
- Case studies with other cities
-
Testing
- Unit tests for core functions
- Validation on synthetic networks
- Continuous integration setup
To contribute:
- Fork the repository
- Create feature branch
- Add tests and documentation
- Submit pull request
If you use this code in your research, please cite:
@software{venice_network_analysis,
author = {[Your-Name]},
title = {Venice Street Network Analysis Suite: Community Detection and Spectral Dimension Analysis},
year = {CURRENT-YEAR},
url = {[Your Repository URL]}
}And cite the original methods:
@article{rosvall2008infomap,
title={Maps of random walks on complex networks reveal community structure},
author={Rosvall, Martin and Bergstrom, Carl T},
journal={Proceedings of the National Academy of Sciences},
volume={105},
number={4},
pages={1118--1123},
year={2008}
}
@article{song2005fractals,
title={Self-similarity of complex networks},
author={Song, Chaoming and Havlin, Shlomo and Makse, Hern{\'a}n A},
journal={Nature},
volume={433},
number={7024},
pages={392--395},
year={2005}
}
@article{boeing2017osmnx,
title={OSMnx: New methods for acquiring, constructing, analyzing, and visualizing complex street networks},
author={Boeing, Geoff},
journal={Computers, Environment and Urban Systems},
volume={65},
pages={126--139},
year={2017}
}This project is released under the MIT License for academic and research purposes.
The Venice street network data is © OpenStreetMap contributors, available under the Open Database License (ODbL).
- OpenStreetMap contributors for providing street network data
- Martin Rosvall for the Infomap algorithm and software
- Geoff Boeing for the OSMnx library
- Hernán Makse and collaborators for MBRW methodology
- NetworkX development team for network analysis tools
- MATLAB for scientific computing platform
- Boost C++ Libraries for efficient data structures
For questions about:
- Infomap algorithm: See Infomap documentation
- OSMnx library: See OSMnx documentation
- MBRW methodology: See cited papers (Song et al., Daqing et al.)
- This implementation: Contact course instructor Uri Hershberg or Loay Egbaria : Loayegb@gmail.com (Student for CS).
Bug reports: Include:
- Operating system and version
- Python/MATLAB version
- Library versions (
pip listorverin MATLAB) - Complete error message
- Minimal reproducible example
Feature requests: Describe:
- Use case and motivation
- Expected behavior
- Example with other networks
- Performance requirements
Version 1.0 (Initial Release)
- Complete Infomap community detection pipeline
- Girvan-Newman alternative method
- MBRW spectral dimension analysis
- Comprehensive visualizations
- Full documentation
Planned for Version 1.1:
- Jupyter notebook tutorials
- Additional validation metrics
- Performance optimizations
- Interactive visualizations
- Cross-city comparison tools
For academic use in network science, urban analytics, and complex systems research. Developed for Vinece Project and researches.
Last updated: 15/11/2025












