Venice Network Analysis

Overview

This repository contains three complementary approaches for analyzing the Venice street network, each revealing different aspects of the city's unique urban structure:

Network Acquisition & Infomap Community Detection (Python) - Downloads Venice street network data and identifies natural neighborhoods using information-theoretic community detection.
Girvan-Newman Community Detection (Python) - Alternative hierarchical community detection method for comparative analysis.
Memory-Biased Random Walk & Spectral Dimension Analysis (MATLAB) - Characterizes the network's fractal properties and geometric scaling behavior.

What This Repository Contains

Script 1 (venice_network_infomap.py) - Network download from OpenStreetMap and Infomap community detection with visualizations
Script 2 (venice_girvan_newman.py) - Girvan-Newman algorithm implementation (computationally intensive, currently commented out)
Script 3 (MBRW_SPECTRAL_DIMENSION_DEMO.m) - MATLAB implementation of MBRW with spectral dimension computation
Data Processing - Automated data acquisition, preprocessing, and format conversion
Visualizations - Publication-quality figures showing community structure, hierarchies, and spectral properties
Analysis Results - CSV files with community assignments, network statistics, and spectral measurements

Requirements & Installation

Python Requirements (Methods 1 & 2)

pandas
osmnx
networkx
numpy
scipy
infomap
matplotlib

Installation:

pip install pandas osmnx networkx numpy scipy infomap matplotlib

MATLAB Requirements (Method 3)

MATLAB R2018b or later
Statistics and Machine Learning Toolbox
Boost C++ Libraries (for MBRW compilation)
C++ compiler (g++ for Linux/Mac, MinGW or MSVC for Windows)

External Tools

Infomap executable (optional for Method 3, enhances MATLAB performance)
- Windows: Download Infomap-win64.zip
- Linux/Mac: Installation guide

MATLAB Setup Instructions

Install Boost C++ Libraries:
- Download from boost.org
- Extract to a known location (e.g., C:\boost_1_89_0)
- Update the boost_path variable in the MATLAB script

Set up C++ compiler:

% In MATLAB, verify compiler setup
mex -setup C++

Install Infomap (optional):
- Download the appropriate binary for your system
- Place in MATLAB working directory or add to system PATH

Method 1: Network Download & Infomap Community Detection

Overview

This method downloads the Venice street network from OpenStreetMap and applies the Infomap algorithm to detect natural communities within the urban structure. The analysis identifies distinct neighborhoods based on how streets and intersections are connected.

Methodology

Network Data Acquisition

OpenStreetMap (OSM) is a collaborative project providing free geographic data worldwide. The osmnx library provides a Python interface to:

Query OSM databases by place name or bounding box
Download street network data with geographic coordinates
Convert raw data into graph structures suitable for analysis

Why Venice?

Unique canal-constrained geography
Well-defined historical districts (sestieri)
Complex multi-island structure
Ideal test case for spatial network analysis

What is Infomap?

Infomap is a community detection algorithm based on information theory and random walks. It works by:

Simulating a random walker traversing the network
Finding the optimal way to compress the description of this walk using a two-level codebook
Using community structure to achieve better compression - communities are regions where the walker spends more time

Key principle: If you can describe the walker's path more efficiently by giving communities their own "local" names, then those communities are meaningful structures.

Advantages:

Information-theoretic foundation (principled approach)
Fast: O(E log E) complexity
Naturally handles weighted and directed networks
Particularly effective for spatial networks

Code Workflow

Step 1: Network Data Download

place_name = "Venezia-Murano-Burano, Veneto, Italy"
graph = ox.graph_from_place(place_name, network_type="all")

Purpose: Acquire the complete street network data for Venice from OpenStreetMap.

Parameters:

place_name: Includes Venice main islands plus Murano and Burano
network_type="all": Downloads all types of roads (streets, paths, walkways, bridges)

What gets downloaded:

Nodes: Intersections and dead-ends with latitude/longitude coordinates
Edges: Street segments connecting nodes, with attributes like street name, length, type
Metadata: Additional information about one-way streets, surface types, etc.

Output:

Directed graph object in memory
Saved as venice.graphml for reuse and portability

Step 2: Graph Preprocessing

G = graph.to_undirected()

if not nx.is_connected(G):
    largest_cc = max(nx.connected_components(G), key=len)
    G = G.subgraph(largest_cc).copy()

Purpose: Prepare the graph for community detection analysis.

Operations:

Convert to undirected:
- Original OSM data includes one-way streets (directed edges)
- Community detection works better on undirected graphs
- Treats all connections as bidirectional relationships
Extract largest connected component:
- Venice has some small isolated islands or disconnected paths
- Community detection requires a connected graph
- Largest component represents the main Venice network
- Typically retains >95% of original nodes

Why this matters: Disconnected components would be trivially identified as separate communities, which doesn't provide useful insights. We want to find structure within the connected network.

Step 3: Infomap Algorithm Configuration

im = infomap.Infomap("--two-level --markov-time 50.0 --seed 42")

Parameters Explained:

--two-level: Restricts algorithm to find flat partition (network → communities)
- Alternative: hierarchical (network → super-communities → communities)
- Two-level is simpler to interpret and visualize
--markov-time 50.0: Resolution parameter controlling community size
- Higher values (100+) → Larger, fewer communities
- Lower values (20-40) → Smaller, more communities
- Value of 50 chosen to balance granularity with interpretability
- Roughly corresponds to "how far can you walk before leaving the community"
--seed 42: Random number generator seed for reproducibility
- Algorithm has stochastic elements
- Same seed = same results every time
- Critical for scientific reproducibility

Step 4: Node ID Mapping

node_to_int = {node: i for i, node in enumerate(node_list)}
int_to_node = {i: node for node, i in node_to_int.items()}

Purpose: Convert OSM node IDs to sequential integers required by Infomap.

Why necessary:

OSM uses large integers (e.g., 123456789) as node IDs
Infomap expects sequential IDs (0, 1, 2, 3, ...)
Mapping ensures compatibility and memory efficiency

Process:

Create forward mapping: OSM_ID → Sequential_ID
Create reverse mapping: Sequential_ID → OSM_ID
Use forward mapping to add edges to Infomap
Use reverse mapping to interpret results

Step 5: Algorithm Execution

for edge in G.edges():
    source_int = node_to_int[edge[0]]
    target_int = node_to_int[edge[1]]
    im.add_link(source_int, target_int)

im.run()

Purpose: Run the Infomap algorithm on the Venice network.

Process:

Add all edges to Infomap instance using integer IDs
Execute algorithm (typically completes in seconds)
Extract results from the tree structure

Output metrics:

Number of communities: How many distinct regions were found
Codelength: Information-theoretic measure of compression quality (lower is better)
Module assignments: Which community each node belongs to

Step 6: Results Extraction and Modularity Calculation

infomap_communities = {
    int_to_node[node_int]: module 
    for node_int, module in infomap_communities_int.items()
}

infomap_modularity = nx.algorithms.community.modularity(G, infomap_comm_list)

Purpose: Convert results back to original node IDs and evaluate quality.

Modularity:

Standard metric for community quality (range: -1 to 1)
Compares actual community structure to random expectation
Q > 0.3: Strong community structure
Q > 0.5: Very strong community structure
Q < 0.2: Weak or no community structure

Community statistics calculated:

Number of communities
Size of each community (number of nodes)
Geographic bounds (longitude/latitude ranges)
Network density within each community

Visualizations

Visualization 1: Venice street network

Figure 1: Raw Venice street network downloaded from OpenStreetMap

This figure shows the raw street network of Venice, as obtained from OpenStreetMap.

Nodes represent street intersections.
Edges represent navigable paths connecting these intersections.
The network captures the overall connectivity and layout of Venice’s streets, including canals and pedestrian routes.
This visualization provides the foundation for community detection analysis, as it allows us to identify clusters of intersections that form structurally coherent regions.

Visualization 2: Basic Community Map

# All communities shown with different colors

Purpose: Overview of the complete community structure in geographic space.

Features:

Each community rendered in unique color (up to 20 distinct colors)
Node size: 30 pixels for visibility
Edges: Gray with 15% transparency to show connectivity without cluttering
Legend: Lists all communities with their sizes
Title: Shows number of communities and modularity score

Figure 2: Complete community structure of Venice street network with Infomap detection

Each color represents a distinct community detected by the Infomap algorithm.
Nodes located in spatial proximity are frequently assigned to the same community.
Communities reflect structural groupings within Venice's street network.
Larger communities correspond to areas with higher internal connectivity.

Additional Details

Each point represents a street intersection (graph node).
Lines represent navigable paths connecting these intersections.
Identified communities correspond to meaningful structural or functional zones.
Modularity = 0.814 , indicating the clarity of community separation.
The inset map on the right indicates the spatial position of the displayed region within the broader Venice network.

Visualization 3: Multi-Panel Detailed Analysis

Four complementary views in a single figure:

Panel A: Community Distribution Map

All nodes colored by community membership
Geographic layout using true coordinates
Comprehensive view of spatial organization

Panel B: Community Size Distribution

Histogram showing frequency of different community sizes
Reveals whether structure is balanced or hierarchical
Helps identify dominant communities

Panel C: Top 6 Largest Communities Highlighted

Background nodes in light gray for context
Five largest communities in distinct bright colors
Focus on major structural divisions

Panel D: Size vs. Geographic Spread Scatter Plot

X-axis: Number of nodes in community
Y-axis: Geographic extent (computed from coordinate range)
Each point is a community, colored by ID
Reveals compact vs. dispersed communities

Figure 3: Four-panel detailed analysis of community structure

Panel A – Overall Community Structure :

Shows all detected Infomap communities projected on the physical map of Venice.
Each community is consistently color-coded throughout the figure, spatial clusters represent regions of the street network that are internally well connected.

Panel B – Distribution of Nodes per Community :

Displays the number of nodes (street intersections) in each community.
The distribution indicates whether Venice has a hierarchical community structure or a more balanced organization of regions.

Panel C – Spatial Layout of the Six Largest Communities :

Communities 1–6 are shown in physical space, with each point representing a street intersection.

Community sizes:

C1: 1530 nodes
C2: 1013 nodes
C3: 927 nodes
C4: 870 nodes
C5: 785 nodes
C6: 785 nodes

These communities occupy distinct geographic regions within the city.

Panel D – Relationship Between Geographic Size and Number of Intersections :

Geographic spread is measured in degrees.

Each point represents one community, plotted by its physical size and number of nodes.

The strong positive correlation (R = 0.842, P = 3.544e-02) indicates that communities with larger geographic extent tend to contain more intersections.
Community 2 contains more intersections than expected given its geographic size, making it an outlier.

Visualization 4: Individual Community Analysis

# Top 6 largest communities shown individually

Purpose: Detailed examination of major communities in isolation.

For each of the 6 largest communities:

Background: All nodes in light gray (context)
Highlight: Community nodes in red, size 40 pixels
Internal edges: Red, thick (1.0 width)
Statistics: Size, number of internal edges
Geographic context: Community position relative to whole network

Figure 4: Individual visualizations of the six largest communities

Interpretation:

Shape: Linear (canal-following) vs. compact (island-centered)
Density: Number of internal connections
Position: Which part of Venice (north, south, main island, outer islands)
Boundaries: How community interfaces with neighbors

Use cases:

Identifying specific neighborhoods
Understanding local connectivity patterns
Finding natural entry/exit points

Visualization 5: Density Heatmap Analysis

Two complementary views:

Left Panel: Community Membership

Scatter plot with nodes colored by community ID
Colorbar showing community assignments
Clear visualization of boundaries

Right Panel: Geographic Density

Heatmap showing local node density
Uses spatial KD-tree for efficient radius queries
Warmer colors (red/orange) = denser street networks
Cooler colors (yellow) = sparser areas

Figure 5: Community membership and geographic density comparison

Interpretation:

Compare community boundaries with density patterns
High-density areas often form community cores
Low-density regions may be boundaries or parks
Reveals whether communities correspond to dense vs. sparse regions

Technical note: Density computed using radius = 0.001 (coordinate units), counting neighbors within this distance.

Output Files

1. `venice.graphml`

Format: GraphML (Graph Markup Language) - XML-based graph format

Contents:

All nodes with their OSM IDs and coordinates
All edges with their connections
Metadata attributes (street names, types, etc.)

Usage:

Can be loaded into Gephi for additional visualization
Reusable for other analyses without re-downloading
Portable across different graph libraries

2. `venice_infomap_communities.csv`

Format: CSV with node-level community assignments

Column	Description
`node_id`	Original OSM node identifier
`longitude`	East-West coordinate
`latitude`	North-South coordinate
`infomap_community`	Assigned community ID (0, 1, 2, ...)

Usage:

Import into GIS software (QGIS, ArcGIS)
Further statistical analysis
Correlation with demographic or economic data
Validation against official boundaries

Example rows:

node_id,longitude,latitude,infomap_community
260616588,12.3377289,45.4371711,0
260616589,12.3378123,45.4372456,0
260616590,12.3375678,45.4369234,1

3. `venice_infomap_community_stats.csv`

Format: CSV with community-level aggregated statistics

Column	Description
`community_id`	Unique community identifier
`size`	Number of nodes (intersections)
`num_edges`	Number of internal connections
`density`	Network density (0 to 1)
`center_lon`	Geographic center longitude
`center_lat`	Geographic center latitude
`lon_min`, `lon_max`	Longitude bounding box
`lat_min`, `lat_max`	Latitude bounding box

Computed metrics:

Density = (actual edges) / (possible edges) = E / [N(N-1)/2]
- Density = 1: Fully connected (every node connected to every other)
- Density = 0: No internal connections
- Typical range: 0.01 - 0.15 for spatial networks
Geographic center: Mean of all node coordinates
Bounding box: Smallest rectangle containing all community nodes

Usage:

Quantitative comparison between communities
Statistical analysis of community characteristics
Identifying outliers (unusually large, dense, or dispersed)

Community Statistics Output

The script prints detailed statistics for the top 5 largest communities:

Community 0 (Rank #1):
  Size: 2847 nodes
  Geographic center: (12.338456, 45.437234)
  Geographic bounds:
    Longitude: [12.315678, 12.356789]
    Latitude:  [45.425678, 45.448901]
  Network statistics:
    Edges: 3456
    Density: 0.0856
    Avg clustering: 0.2341

Interpretation:

Size: Larger communities may represent main islands or districts
Center: Can be mapped to actual Venice locations
Bounds: Geographic extent of the community
Edges: More edges = better internal connectivity
Density: Higher = more connected (grid-like vs. tree-like)
Clustering: Local connectivity measure (0 to 1)

Method 2: Girvan-Newman Community Detection

Overview

This method provides an alternative hierarchical approach to community detection. Unlike Infomap's information-theoretic method, Girvan-Newman uses edge betweenness to progressively divide the network, providing multi-scale community structure analysis.

(Currently commented out due to computational intensity, but included for comparative analysis).

Methodology

What is Girvan-Newman?

The Girvan-Newman algorithm is a divisive hierarchical clustering method that works by:

Computing edge betweenness: For each edge, count how many shortest paths between all node pairs pass through it
Removing highest betweenness edge: The edge most "between" communities is likely a bridge
Recalculating betweenness: After removal, recompute for remaining edges
Iterating: Repeat until network is fully decomposed
Creating dendrogram: Track the sequence of divisions

Key insight: Edges connecting different communities will have high betweenness because they're on many shortest paths. Removing them reveals community structure.

Hierarchical Output

Unlike Infomap's flat partition, Girvan-Newman produces a hierarchy:

Level 0: 2 communities (first split)
Level 1: 3+ communities (subsequent splits)
Level 2: More communities (process continues)
...
Level N: Eventually isolated nodes

This allows exploring community structure at multiple resolutions.

Comparison with Infomap

Aspect	Infomap	Girvan-Newman
Approach	Information-theoretic (random walks, compression)	Structural (edge betweenness, shortest paths)
Output	Single flat partition	Hierarchical dendrogram
Speed	Fast: O(E log E)	Slow: O(m²n) where m=edges, n=nodes
Memory	Modest: O(N + E)	High: O(N + E) storage, O(N²) operations
Best for	Large networks, single resolution	Smaller networks, multi-scale analysis
Resolution control	Markov time parameter	Choose level in hierarchy
Deterministic	Stochastic by default, deterministic with seed	Deterministic (same input = same output)

Computational considerations for Venice:

Venice network (largest component): ~14,000-16,000 nodes, ~18,000-20,000 edges
Girvan-Newman: Estimated 10-30 minutes for 15 levels (hardware dependent)
Infomap completes same analysis in ~5 seconds
Trade-off: Girvan-Newman provides hierarchical insights not available from Infomap

Code Implementation

Step 1: Hierarchical Community Generation

from networkx.algorithms.community import girvan_newman

gn_iterator = girvan_newman(G)
max_levels = 15  # Limit iterations for computational efficiency

Purpose: Generate community structures at multiple scales.

Process:

girvan_newman(G) returns an iterator (lazy evaluation)
Each iteration removes one edge and recalculates
max_levels = 15 limits to first 15 divisions
Full decomposition (to N communities) would be impractically slow

Step 2: Modularity Tracking

for i, communities in enumerate(gn_iterator):
    if i >= max_levels:
        break
    
    comm_list = [set(c) for c in communities]
    modularity = nx.algorithms.community.modularity(G, comm_list)
    
    gn_communities_levels.append({
        'level': i,
        'num_communities': len(comm_list),
        'communities': comm_list,
        'modularity': modularity
    })

Purpose: Evaluate quality of partitions at each hierarchical level.

Why track modularity:

Not all levels in the hierarchy are equally meaningful
Modularity peaks at the "natural" scale of community structure
Highest modularity level is typically chosen as the best partition

Output: List of partitions with their modularity scores, allowing selection of optimal resolution.

Step 3: Best Level Selection

best_level = max(gn_communities_levels, key=lambda x: x['modularity'])
gn_comm_list = best_level['communities']
gn_modularity = best_level['modularity']

Purpose: Automatically select the hierarchical level with strongest community structure.

Alternative strategies:

Choose level with specific number of communities (e.g., 6 for Venice sestieri)
Use domain knowledge about expected structure
Visualize entire hierarchy and manually select

Visualizations Generated

1. Main Community Map

File: venice_girvan_newman_communities.png

Figure 1: Girvan-Newman Community Detection - Complete Venice Network

What this visualization shows:

This map displays the complete Venice street network with nodes colored according to their community membership as determined by the Girvan-Newman algorithm. Each color represents a distinct community, with colors assigned by size rank (largest community gets the first color, second-largest gets the second color, etc.).

Key elements:

Network structure: Gray edges show the complete street connectivity of Venice
Community nodes: Colored points represent street intersections, with each color indicating community membership
Legend: Shows the top 10 largest communities with their node counts (e.g., "#1: 3,247 nodes")
Title information: Displays the total number of communities detected, the hierarchical level selected, and the modularity score (typically 0.3-0.5)
Explanatory text box: Upper-left corner provides algorithm details and quality metrics

What to look for:

Geographic coherence: Do communities form spatially contiguous regions?
Size distribution: Is there one dominant community or several comparable ones?
Spatial patterns: Do communities align with known Venice districts (sestieri) or waterways?
Boundary regions: Where do different communities meet? These are typically bridges or major thoroughfares.

2. Multi-Panel Detailed Analysis (2×2 grid)

File: venice_girvan_newman_detailed_analysis.png

Figure 2: Girvan-Newman Detailed Analysis - Four-Panel Comprehensive View

What this visualization shows:

This four-panel figure provides a comprehensive analytical view of the Girvan-Newman results, combining spatial, statistical, and hierarchical perspectives.

Panel-by-panel breakdown:

Top-Left: Community Spatial Distribution

Shows the same community map as Figure 1, but with the full Venice street network visible as background context
Demonstrates how communities are distributed across Venice's geographic space
Helps identify whether communities correspond to islands, neighborhoods, or other spatial features

Top-Right: Community Size Distribution (Bar Chart)

X-axis: Community rank (1 = largest, 2 = second-largest, etc.)
Y-axis: Number of nodes in each community
Shows the inequality in community sizes
Typically reveals a power-law-like distribution with a few large communities and many small ones
Helps understand if the network has a dominant "core" community or balanced structure

Bottom-Left: Modularity vs Hierarchical Level

X-axis: Hierarchical level (0 to 14 in our implementation)
Y-axis: Modularity score (quality metric, range −0.5 to 1.0)
Green line: Shows how modularity changes as more edges are removed
Red dashed line: Marks the level with maximum modularity (the "best" partition)
Reveals the optimal level for community detection
Typically shows modularity rising, peaking, then declining as the network becomes over-fragmented

Bottom-Right: Number of Communities vs Level

X-axis: Hierarchical level (0 to 14)
Y-axis: Number of communities detected
Orange line: Shows how the network progressively divides
Red dashed line: Marks the optimal level (same as bottom-left)
Demonstrates the hierarchical nature of the algorithm
Generally increases monotonically, though the rate may vary

Key insights from this figure:

Optimal resolution: The level where modularity peaks indicates the "natural" community structure
Hierarchical structure: Shows whether Venice has clear multi-scale organization
Size inequality: Reveals whether the network is dominated by one large component or has balanced communities
Quality validation: High modularity (>0.3) indicates strong community structure

3. Individual Community Views (2×3 grid)

File: venice_girvan_newman_individual_communities.png

Figure 3: Top Six Largest Communities - Individual Detailed Views

What this visualization shows:

This six-panel figure zooms into the largest six communities individually, showing each one in detail against the backdrop of the complete Venice street network.

Panel structure:

Each subplot shows one community (ranked #1 through #6 by size)
Gray lines: Complete Venice street network (for geographic context)
Colored nodes: Members of the highlighted community
Colored edges: Internal connections within the community

Information displayed for each community:

Title: Community ID, size rank, and node count (e.g., "Community 5 (Rank #1): 3,247 nodes")
Edge count: Number of internal edges showing connectivity
Geographic extent: Full Venice map maintained as reference
Spatial coherence: Whether the community forms a contiguous region

What to analyze:

For individual communities:

Geographic coherence: Does the community occupy a single contiguous region or multiple disconnected areas?
Internal connectivity: Dense edge networks suggest well-connected neighborhoods
Spatial extent: Some communities may be geographically compact, others sprawling
Boundary definition: Clear separation from other communities or fuzzy boundaries?

Across all six panels:

Spatial distribution: Do large communities occupy different parts of Venice (e.g., main island vs. outer islands)?
Size-geography relationship: Are larger communities more sprawling or just more densely connected?
Structural differences: Do some communities show grid-like patterns while others are more organic?
Real-world correspondence: Do communities align with known Venice features like:
- Historic sestieri (San Marco, Castello, Cannaregio, etc.)
- Major islands (Murano, Burano)
- Main thoroughfares or canal systems

Typical observations for Venice:

Largest community (Rank #1): Often covers the main island's central core
Second/third communities: May correspond to major sestieri or outer islands
Smaller communities: Often represent peripheral areas, bridges, or isolated island groups
Geographic logic: Well-detected communities should make intuitive sense given Venice's canal-separated structure

This detailed view allows validation of the algorithm's results against real-world knowledge of Venice's urban geography and helps identify whether detected communities have meaningful interpretations beyond pure network statistics.

Detailed Community Analysis

def analyze_communities(communities, name):
    for i, comm in enumerate(sorted(communities, key=len, reverse=True)[:5]):
        # Print statistics for top 5 communities

Purpose: Detailed statistics for largest communities.

For each of top 10 communities:

Size (number of nodes)
Geographic center (longitude, latitude)
Geographic bounds (bounding box)
Geographic spread (spatial extent)
Network statistics (edges, density, clustering coefficient)

Use cases:

Understanding community structure
Validating geographic coherence
Comparing with expected Venice districts (sestieri)

Output Files

1. `venice_girvan_newman_communities.csv`

Node-level community assignments

Column	Description
`node_id`	OSM node identifier
`longitude`	Geographic coordinate (X)
`latitude`	Geographic coordinate (Y)
`girvan_newman_community`	Community ID (0 to N-1)

2. `venice_girvan_newman_community_stats.csv`

Community-level statistics

Column	Description
`community_id`	Community identifier
`size`	Number of nodes in community
`num_edges`	Internal edges within community
`density`	Internal edge density [0,1]
`center_lon`, `center_lat`	Geographic centroid
`lon_min`, `lon_max`	Longitude bounds
`lat_min`, `lat_max`	Latitude bounds

3. `venice_girvan_newman_hierarchy.csv`

Hierarchical level analysis

Column	Description
`level`	Hierarchical level (0 to max_levels-1)
`num_communities`	Communities at this level
`modularity`	Modularity score [−0.5, 1]

Usage:

Track how modularity changes across hierarchy
Identify optimal number of communities
Visualize hierarchical decomposition process

When to Use Girvan-Newman

Use if:

Working with smaller networks (< 5,000 nodes preferred)
Need hierarchical community structure
Want to explore multi-scale organization
Have computational resources (multi-core CPU, time)
Interested in method comparison/validation

Alternative (use Infomap) if:

Working with large networks (> 20,000 nodes)
Only need single-resolution partition
Limited computational time
Primary interest is fast analysis

For Venice: Both methods are feasible, Girvan-Newman provides additional hierarchical insights at the cost of longer runtime.

Computational estimates for Venice:

Full 15 levels: 30-120 minutes (depends on hardware)
Single level: 2-15 minutes
Recommendation: Run on computing cluster

Enabling the Algorithm

To run Girvan-Newman analysis:

Locate the commented section in the Python script
Uncomment the entire Girvan-Newman block (remove ''' at the first and last).

Adjust parameters if needed:

max_levels = 15  # Reduce to 2-5 for faster execution

Run the script and monitor progress
Compare results with Infomap

Testing recommendation: First test on a smaller network or subgraph to estimate runtime.

Method 3: MBRW Spectral Dimension Analysis

Overview

This MATLAB-based method characterizes the geometric and fractal properties of the Venice street network using Memory-Biased Random Walks (MBRW). While Methods 1 and 2 identify what the communities are, Method 3 reveals how the network scales across different size ranges - from individual streets to the entire city.

Methodology

What is Memory-Biased Random Walk?

MBRW is an enhanced random walk process that explores network structure more effectively than simple random walks:

Standard Random Walk:

At each step, randomly choose a neighbor
Problem: Frequently backtracks, doesn't explore efficiently
Like a drunk person wandering aimlessly

Memory-Biased Random Walk:

Remembers recently visited nodes (memory)
Heavily penalizes returning to recent locations (bias)
Explores systematically like an intentional navigator

Mathematical formulation:

Memory (m): Remember last m visited nodes
Bias (b): Probability of revisiting recent nodes reduced by factor b
Transitions: P(i→j) ∝ 1/b if j was visited recently, ∝ 1 otherwise

What is Spectral Dimension?

The spectral dimension (Dq) generalizes the concept of dimensionality for complex networks:

Intuition:

D = 1: Network behaves like a line (1-dimensional)
D = 2: Network behaves like a plane (2-dimensional)
D = 3: Network behaves like a volume (3-dimensional)
1 < D < 2: Fractal structure (incomplete space-filling)

Technical definition: Dq measures how the "mass" (number of visited nodes) scales with "radius" (walk length):

Mass(L) ∝ L^(Dq)

where L is the walk length and q is the moment order.

Multiple dimensions (Dq spectrum):

q → -∞: Emphasizes rarely visited regions
q = 0: Counts distinct visited nodes
q = 1: Entropy-based (information dimension)
q = 2: Variance-based (correlation dimension)
q → +∞: Emphasizes frequently visited regions

Multifractality: If Dq varies with q, the network is multifractal - different parts have different scaling properties. This indicates hierarchical organization.

Why This Matters for Venice

Venice's unique structure creates interesting geometric properties:

Canal constraints: Limit connectivity, creating fractal-like patterns
Bridge bottlenecks: Create hierarchical scales (local islands, bridge connections, city-wide)
Historical growth: Organic development leads to self-similar patterns
Spatial embedding: Unlike online networks, Venice is constrained to 2D space

Expected findings:

Dq ≈ 1.5-2.2: Between linear and fully planar
ΔDq > 0: Multifractal behavior from heterogeneous structure
Correlations: Spectral dimension changes may align with community boundaries

Code Workflow (MATLAB)

Step 1: Network Loading and Preprocessing

venice_graphml_file = 'networks/venice.graphml';

Purpose: Load the Venice network exported from the Python script.

Streaming parser implementation:

fid = fopen(venice_graphml_file, 'r', 'n', 'UTF-8');
while ~feof(fid)
    line = fgetl(fid);
    % Parse nodes and edges line-by-line
end

Why streaming?

GraphML files can be 50-200 MB for large networks
Loading entire file into memory inefficient
Line-by-line processing keeps memory footprint low
Enables processing networks larger than available RAM

What gets extracted:

Node IDs: Unique identifiers from GraphML
Edge list: Connections between nodes
Coordinates: LINESTRING geometries from edges
- First coordinate → source node position
- Last coordinate → target node position
Geographic data: Longitude and latitude for visualization

Node ID remapping:

node_to_int = containers.Map(node_ids, 1:node_count);

OSM IDs are large random integers (e.g., 5698234712). MATLAB graphs work better with sequential IDs (1, 2, 3, ...). Remapping saves memory and improves performance.

Step 2: 2-Core Extraction

while true
    node_degrees = degree(G_2core);
    low_degree_nodes = find(node_degrees < 2);
    if isempty(low_degree_nodes)
        break;
    end
    G_2core = rmnode(G_2core, low_degree_nodes);
end

Purpose: Extract the network "skeleton" by removing dead-end streets.

What is a k-core?

A k-core is the maximal subgraph where every node has degree ≥ k
2-core means every node has at least 2 connections
Removes "whiskers" (degree-1 nodes) and isolated edges

Why this is necessary for MBRW:

Dead ends bias results: Random walker gets trapped in cul-de-sacs
Artificial features: Many dead ends are data artifacts (address points, parking lots)
Core structure: 2-core represents the true interconnected street network
Computational efficiency: Fewer nodes → faster MBRW simulation

Iterative process:

Removing degree-1 nodes can create new degree-1 nodes
Must iterate until no more removals possible
Typically converges in 2-5 iterations

Statistics printed:

Iteration 1: Removed 3245 nodes with degree < 2
Iteration 2: Removed 456 nodes with degree < 2
Iteration 3: Removed 23 nodes with degree < 2
2-core graph has 28543 nodes and 41267 edges

Step 3: Largest Connected Component (LCC)

cc_bins = conncomp(G_2core);
[largest_size, largest_cc_idx] = max(component_sizes);
G = subgraph(G_2core, largest_cc_nodes);

Purpose: Ensure analysis is performed on a single connected network.

Why necessary:

Venice includes small disconnected islands (Torcello, Sant'Erasmo, etc.)
MBRW cannot traverse between disconnected components
Analysis on disconnected graphs gives misleading results
Largest component typically contains >98% of nodes

Component analysis:

component_sizes = zeros(num_components, 1);
for i = 1:num_components
    component_sizes(i) = sum(cc_bins == i);
end

Prints all component sizes to identify if any significant portions are being excluded.

Output: Clean, connected network saved as venice_2core_lcc.txt in edge list format:

This format is required by the MBRW C++ executable.

Step 4: Infomap Community Detection (MATLAB)

markov_times = [50, 70]; % add more/less if neccessary...
infomap_command = sprintf('"%s" "%s" infomap_output --undirdir --two-level --markov-time %d --num-trials 10', ...
    infomap_exe, infomap_input_file, mt);

Purpose: Detect communities in MATLAB environment for integration with MBRW analysis.

Why run Infomap in MATLAB?

Independent validation of Python results
Enables correlation analysis with spectral dimensions
Allows integrated visualizations in single environment
Tests robustness across different implementations

Export to Pajek format:

fprintf(fid, '*Vertices %d\n', numnodes(G));
fprintf(fid, '*Edges\n');

Pajek (.net) format is Infomap's native input format, simpler than GraphML.

Parameter sweep: Tests multiple Markov time values to find configuration producing 6 communities (Venice's traditional sestieri):

MT = 50: Typically 8-12 communities
MT = 70: Typically 5-8 communities
Tries 10 independent runs (--num-trials 10) for robustness

Version compatibility handling:

% Try different flags for Infomap compatibility
[status, cmdout] = system(infomap_command);
if status ~= 0
    % Try alternative flag
    infomap_command = sprintf('... --undirdir ...');
end

Different Infomap versions use different command-line flags. Code tries multiple variations.

Fallback mechanism: If Infomap executable fails, falls back to native MATLAB implementation:

[infomap_communities, infomap_modularity] = louvain_communities_matlab(G, 6);

Uses k-means clustering on node degrees as simple alternative.

Step 5: Community Visualization

h1 = plot(G, 'XData', plot_x, 'YData', plot_y, ...
     'EdgeColor', [0.7 0.7 0.7], 'EdgeAlpha', 0.3, ...
     'MarkerSize', 6, 'LineWidth', 0.5);
h1.NodeCData = infomap_communities;

Purpose: Create publication-quality visualizations using true geographic coordinates.

Coordinate mapping challenge: After 2-core extraction and LCC selection, node indices change. Need to carefully map:

Original GraphML node IDs → Coordinates
Filtered graph nodes → Original IDs
Coordinates → Plot positions

Multi-panel figure:

Panel 1: Network with Communities

Geographic layout (longitude/latitude)
Nodes colored by community
Semi-transparent edges for context

Panel 2: Size Distribution

Histogram of community sizes
Shows whether communities are balanced

Panel 3: Ranked Sizes

Bar chart sorted by size
Identifies hierarchy

Panel 4: Parameter Sweep

Dual y-axis showing communities vs. modularity
Helps choose optimal Markov time

Panel 5: Statistics Table

Text listing all communities
Sizes and modularity metrics

Figure 7: MATLAB multi-panel community detection analysis

Step 6: Geographic Community Map

figure('Position', [100 100 1200 900], 'Name', 'Venice Communities - Geographic View');
h = plot(G, 'XData', plot_x, 'YData', plot_y, ...);
colormap(hsv(num_infomap_communities));

Purpose: High-quality geographic visualization for presentations.

Features:

True Venice coordinates: Accurate geographic layout
HSV colormap: Maximum color distinction between communities
High transparency edges: Emphasize nodes over connections
Proper axis labels: Longitude and Latitude
Colorbar: Shows community IDs
High resolution: 300 DPI for publication quality

Coordinate validation:

valid_coords = ~isnan(plot_x) & ~isnan(plot_y);
coords_available = sum(valid_coords);

Checks what percentage of nodes have valid coordinates. Requires >30% for geographic plot.

Subgraph for valid coordinates: If some nodes lack coordinates, creates subgraph with only nodes that have geographic data:

G_geo = subgraph(G, nodes_with_coords);

This ensures clean visualization without missing data artifacts.

Figure 8: High-quality geographic community visualization

Step 7: MBRW Parameter Configuration

bias = 10000;
memory = 5;
rand_seed = 0;
log_num_steps = 23;

Purpose: Configure the memory-biased random walk simulation.

Parameter explanations:

bias = 10000 (Very high exploration bias)

Probability of returning to recently visited node reduced by factor 10,000
Effect: Walker strongly prefers unexplored regions
Why high: Venice is dense, need strong bias to avoid local trapping
Lower values (100-1000): More local exploration
Higher values (50000+): Almost never revisit

memory = 5 (Remember last 5 nodes)

Walker tracks last 5 visited nodes
Prevents immediate backtracking: can't return to last 5 nodes easily
Simulates intelligent navigation
Typical range: 3-7
- Too low (1-2): Excessive backtracking
- Too high (10+): Too restrictive, may get stuck

rand_seed = 0 (Reproducibility)

Controls random number generator
Same seed = identical results
Critical for scientific reproducibility
Change to test sensitivity

log_num_steps = 23 (Simulate up to 2²³ ≈ 8.4 million steps)

Covers scales from 1 edge to full network diameter
Need wide range to observe scaling behavior
Venice diameter: ~100-200 hops
Factor of 1000 safety margin ensures full coverage
Each doubling explores one more scale

Orders (moment orders q):

finite_orders = -19:19;
orders = [-Inf finite_orders Inf];

Computes spectral dimensions for q from -∞ to +∞:

q = -19 to -1: Emphasize rarely visited nodes
q = 0: Count distinct visited nodes
q = 1 to 19: Emphasize frequently visited nodes
q = -∞, +∞: Extreme emphasis

Step 8: C++ Compilation and Execution

compile_command = sprintf('g++ -std=c++0x -I %s %s.cpp -o %s -O2', ...
    boost_path, c_executable_name, c_executable_name);

Purpose: Compile and run the MBRW simulation in C++ for computational efficiency.

Why C++ instead of MATLAB?

Speed: C++ is 50-100× faster than MATLAB for this task
Memory: More efficient data structures
Boost library: Provides optimized graph algorithms
Scale: Can handle millions of random walk steps

Compilation flags:

-std=c++0x: Use C++11 standard features
-I %s: Include Boost library path
-O2: Optimization level 2 (good speed, reasonable compile time)
Alternative: -O3 (maximum optimization, slower compilation)

Execution:

run_command = sprintf('%s%s -i %s -q %s -o %s -b %u -m %u -r %u -c %u', ...
    dot_if_linux, c_executable_name, mbrw_able_edge_file_name, ...
    order_file_name, segment_mass_log_multimean_file_name, ...
    bias, memory, rand_seed, log_num_steps);

Command-line arguments:

-i: Input edge list file
-q: Orders file (q-values to compute)
-o: Output file for results
-b: Bias parameter
-m: Memory parameter
-r: Random seed
-c: log₂(max steps)

Caching:

if ~exist(segment_mass_log_multimean_file_name, 'file')
    % Compile and run
end

Only recompiles/reruns if output file doesn't exist. Saves time on repeated analyses.

Typical runtime:

Venice (30K nodes): 10-30 minutes
Progress printed to console
Can be interrupted and resumed

Step 9: Segment Mass Analysis

log2_generalized_means = readmatrix(segment_mass_log_multimean_file_name);

Purpose: Analyze how network exploration scales with walk length.

Data structure:

Rows: Segment lengths (2⁰, 2¹, 2², ..., 2²³)
Columns: Different q-values (-∞, -19, -18, ..., 0, ..., 18, 19, +∞)
Values: log₂ of generalized mean segment mass

What is segment mass? For a walk of length L:

Divide walk into segments of length L
For each segment, count visited nodes
Compute generalized mean across all segments:
- M_q = [mean(mass^q)]^(1/q)

Physical interpretation:

M₀: Number of distinct nodes visited
M₁: Exponential of entropy (information measure)
M₂: Standard deviation of mass distribution
M_∞: Maximum mass (most-visited node count)

Interpretation:

Linear regions: Power-law scaling (fractal behavior)
Slope: Related to spectral dimension
Curvature: Indicates finite-size effects or crossover scales
Separation between curves: Indicates multifractality
- Parallel curves → unifractal
- Diverging curves → multifractal

Step 10: Spectral Dimension Computation

Dq = spectral_dimensions_v2(log2_generalized_means, 'length', num_nodes);

Purpose: Extract spectral dimensions from segment mass data.

Algorithm:

Identify scaling region: Find where curves are approximately linear
- Usually middle range: segment length ≈ 16 to 2048
- Avoid small L (local effects) and large L (finite-size effects)
Compute local slopes:
- dM/dL in log-log space
- Use finite differences or robust regression
Extract Dq:
- Slope of log(mass) vs. log(length) equals Dq
- Different q-values give different dimensions

Function signature:

function Dq = spectral_dimensions_v2(log2_means, method, num_nodes)

Parameters:

log2_means: Input data matrix
method: 'length' (use length-based scaling) or 'mass' (use mass-based)
num_nodes: Network size for normalization

Output:

Dq: Array of spectral dimensions, one per q-value
Length = number of q-values (41 in this case)

Key metrics computed:

mean_Dq = mean(finite_order_Dqs);
std_Dq = std(finite_order_Dqs);
deltaDq = max(finite_order_Dqs) - min(finite_order_Dqs);

Mean: Average spectral dimension
Std: Variability across q
ΔDq: Range (multifractal measure)

Visualizations

Visualization 1: Segment Mass Curves

Purpose: Show raw MBRW results and scaling behavior.

Features:

X-axis: log₂(segment length) from 0 to 23
Y-axis: log₂(q-mean segment mass)
Multiple curves: One per q-value, color-coded
Legend: All q-values including ±∞
Grid: For reading values

Color scheme:

line_colors = jet(num_orders);

Blue: Negative q (rare regions)
Green/Yellow: q near 0
Red: Positive q (common regions)

What to look for:

Overall trend: Should increase with length (more steps → more coverage)
Linearity: Straight lines indicate power-law (fractal) scaling
Curvature at large L: Finite-size effects (network has finite size)
Spacing between curves: Larger spacing → stronger multifractality

Figure 10: MBRW segment mass log-log plot

Visualization 2: Spectral Dimension Spectrum

Purpose: Visualize how spectral dimension varies with moment order.

Features:

plot(finite_orders, finite_order_Dqs, 'b-o', 'LineWidth', 2, 'MarkerSize', 6)

X-axis: q from -19 to +19
Y-axis: Spectral dimension Dq
Blue line with circles: Easy to read
Grid: Precise value reading
Text box: Mean ± std deviation

Typical patterns:

Decreasing Dq with increasing q:

q=-19: D=2.1
q=0:   D=1.9
q=19:  D=1.7

Indicates heterogeneous structure - dense regions (high q) have lower dimension.

Flat Dq profile:

All q: D≈1.9 ± 0.05

Indicates uniform, homogeneous network structure.

Sharp transitions:

q<0:  D=2.0
q>0:  D=1.5

May indicate distinct structural regimes or phase transitions.

Figure 11: Spectral dimension spectrum (Dq vs q)

Interpretation for Venice:

Expected mean: 1.7-2.1 (constrained planar network)
Expected ΔDq: 0.1-0.3 (moderate multifractality from canals)
Shape: Likely decreasing (denser core, sparser periphery)

Visualization 3: Combined Analysis Figure

Creates comprehensive figure showing:

Community structure (left)
Spectral dimension profile (right)

Enables visual correlation between:

Community boundaries
Changes in Dq
Geographic features

This integrated view reveals whether community structure corresponds to geometric changes in network organization.

Output Files (MATLAB)

1. `venice_2core_lcc.txt`

Format: Tab-delimited edge list

Properties:

Sequential node IDs (1 to N)
One edge per line
Undirected (each edge listed once)
No weights or attributes

Usage:

Input for MBRW C++ executable
Portable format readable by many tools
Can be loaded into NetworkX, igraph, etc.

Size: ~1-2 MB for Venice network

2. `venice_infomap_communities.csv`

Format: CSV with two columns

NodeID	Community
1	0
2	0
3	1
...	...

Important: NodeIDs are sequential (1 to N) and correspond to the 2-core LCC network, NOT original OSM IDs.

Usage:

Map spectral properties to communities
Statistical analysis of community characteristics
Correlation studies

3. `segment_mass_log_multimeans/venice_b_10000_m_5_r_0_c_23.csv`

Format: CSV matrix (no headers)

Filename encoding:

b_10000: bias parameter
m_5: memory parameter
r_0: random seed
c_23: log₂(max_steps)

Contents:

Rows (24): Segment lengths 2⁰ to 2²³
Columns (41): q-values from -∞ to +∞
Values: log₂(generalized mean segment mass)

Example:

Row 0 (L=1):   [-2.3, -1.8, ..., 0.5, 1.2]
Row 10 (L=1024): [8.4, 9.1, ..., 12.3, 14.7]
Row 23 (L=8M):  [15.2, 15.8, ..., 18.9, 21.3]

Usage:

Raw data for spectral dimension computation
Custom analysis with different methods
Validation and reproducibility
Comparison across networks

Size: ~50-100 KB (small, compressed data)

4. `orders.txt`

Format: Plain text, one value per line

-Inf
-19
-18
...
0
...
18
19
Inf

Purpose: Specifies which q-values to compute.

Usage: Input file for MBRW C++ executable.

5. Generated PNG Images

High-resolution (300 DPI) figures:

venice_infomap_communities.png
- Multi-panel community analysis
- Size distribution
- Parameter sweep
- Statistics
venice_communities_geographic.png
- Geographic layout with communities
- High-quality for presentations
MBRW analysis figures (not auto-saved by default)
- Segment mass curves
- Spectral dimension spectrum
- Use MATLAB's export features to save

Saving additional figures:

print('figure_name', '-dpng', '-r300');

Results Interpretation

Understanding Spectral Dimensions

Physical meaning:

Dq Value	Interpretation
Dq ≈ 1.0	Linear structure (chain-like)
Dq ≈ 1.5	Fractal between 1D and 2D
Dq ≈ 2.0	Planar structure (fills a plane)
Dq ≈ 2.5	Between 2D and 3D
Dq ≈ 3.0	Volumetric (space-filling)

Expected ranges for different networks:

Network Type	Typical Dq	ΔDq	Example
Linear chain	1.0	0.0	Highway system
Tree	1.0-1.3	0.1-0.2	River networks
Fractal	1.3-1.8	0.2-0.4	Coastlines, DLA
Planar grid	2.0	0.0	Manhattan streets
Spatial network	1.5-2.2	0.1-0.3	City streets
Small-world	2.5-4.0	0.3-0.6	Social networks

For Venice specifically:

Expected Dq: 1.7-2.1
- Lower than pure grid (2.0) due to canal constraints
- Higher than tree (1.3) due to bridge connections and islands
Expected ΔDq: 0.15-0.35
- Moderate multifractality from heterogeneous structure
- Denser main islands, sparser outer regions

Understanding Multifractality (ΔDq)

ΔDq = max(Dq) - min(Dq) measures structural heterogeneity:

ΔDq ≈ 0 (unifractal):
- Homogeneous structure
- Similar density everywhere
- Regular grid-like organization
- Example: Manhattan street grid
ΔDq = 0.1-0.3 (weakly multifractal):
- Moderate heterogeneity
- Some dense and sparse regions
- Typical for organic city growth
- Example: Most European cities
ΔDq > 0.5 (strongly multifractal):
- Highly heterogeneous
- Clear hierarchy of scales
- Hub-and-spoke organization
- Example: Airport networks, Internet

Venice interpretation:

ΔDq ~ 0.2: Moderate heterogeneity expected
Sources:
- Dense historic center (San Marco)
- Medium density residential areas
- Sparse outer islands
- Canal-induced constraints

Connecting Community Structure and Spectral Properties

Integration of Methods 1-3:

Infomap communities show modular organization
- Identifies distinct neighborhoods
- Reveals structural divisions
- Based on connectivity patterns
Girvan-Newman shows hierarchical structure
- Multi-scale organization
- Nested communities
- Progressive decomposition
MBRW spectral dimensions show geometric scaling
- How structure changes across scales
- Fractal vs. regular organization
- Dimensional properties

Combined insights:

Community boundaries may correspond to changes in Dq
Large communities may have higher Dq (more planar)
Small communities may have lower Dq (more linear)
Multifractality reflects hierarchy seen in Girvan-Newman

Analysis workflow:

Identify communities (Methods 1 & 2)
Compute Dq for each community separately
Compare spectral properties across communities
Correlate with geographic features (canals, bridges)

Technical Notes

Parameter Tuning - Python (Methods 1 & 2)

Infomap --markov-time:

Lower (20-40): More, smaller communities
Medium (50-70): Balanced resolution
Higher (100-200): Fewer, larger communities

Guidelines:

For Venice: 50-70 works well
For smaller networks: 30-50
For very large networks: 70-100

Girvan-Newman max_levels:

Controls computational time
Each level ≈ 3-10 minutes for Venice
10 levels sufficient for most analyses

Parameter Tuning - MATLAB (Method 3)

MBRW bias:

Low (100-1000): More local exploration
- Use for: Small-scale structure
- Risk: Trapped in local regions
Medium (5000-20000): Balanced
- Use for: General analysis
- Recommended: Start here
High (50000+): Maximum exploration
- Use for: Large, complex networks
- Risk: May miss local structure

MBRW memory:

Low (2-3): More backtracking
- Use for: Trees, sparse networks
Medium (4-6): Balanced
- Use for: Most spatial networks
- Recommended: 5 for Venice
High (7-10): Restricted exploration
- Use for: Dense grids
- Risk: May get stuck

log_num_steps:

Must exceed network diameter by factor 100-1000
Venice (~200 diameter): 23 (8.4M steps) good
Smaller (< 50): 18-20 sufficient
Larger (> 500): 24-25 needed

Testing parameters: Run with different values and check:

Dq values reasonable (1-3 range)
Curves smooth (not jagged)
Clear scaling region visible
Results stable across parameter variations

Reproducibility

Python:

Infomap: --seed 42
Girvan-Newman: Deterministic (no seed needed)
NetworkX functions: Set np.random.seed(42)

MATLAB:

MBRW: rand_seed = 0
Infomap: --seed 42 in command
MATLAB random: rng(0)

Best practices:

Document all parameters in README
Save parameter files with results
Use version control for code
Archive complete output files

Computational Complexity & Requirements

Method 1 - Infomap (Python):

Time: O(E log E) ≈ 5-30 seconds for Venice
Memory: O(N + E) ≈ 500 MB
Hardware: Any modern laptop

Method 2 - Girvan-Newman (Python):

Time: O(m²n) to O(m²n²) ≈ 30-120 minutes for Venice
Memory: O(N²) ≈ 3-8 GB
Hardware: 8+ GB RAM recommended

Method 3 - MBRW (MATLAB):

Time: O(T log T) where T = 2^log_num_steps
- Venice with log_num_steps=23: 10-60 minutes
- Compilation: ~1 minute (one-time)
Memory: O(N × log_num_steps) ≈ 2-4 GB
Hardware: 8+ GB RAM, C++ compiler required

Recommendations:

Minimum: 8 GB RAM, dual-core 2.5+ GHz
Recommended: 16 GB RAM, quad-core 3.0+ GHz
Optimal: 32 GB RAM, 8-core CPU for parallel work

Speed optimization:

Python: Use PyPy for faster execution
MATLAB: Parallel Computing Toolbox for multiple networks
C++: Increase optimization level to -O3
Hardware: SSD for faster file I/O

Summary Table: Method Comparison

Aspect	Infomap	Girvan-Newman	MBRW
What it reveals	Communities	Hierarchy	Geometry
Output	Flat partition	Dendrogram	Dimensions
Time (Venice)	5-30 sec	30-120 min	10-60 min
Memory	500 MB	3-8 GB	2-4 GB
Deterministic	No (use seed)	Yes	No (use seed)
Resolution	Markov time	Level in hierarchy	q-values
Best for	Fast analysis	Multi-scale	Fractal properties
Scalability	Excellent	Poor	Good
Implementation	Python	Python	MATLAB+C++

References

Community Detection

Rosvall, M., & Bergstrom, C. T. (2008). Maps of random walks on complex networks reveal community structure. Proceedings of the National Academy of Sciences, 105(4), 1118-1123.
- Original Infomap paper
Girvan, M., & Newman, M. E. J. (2002). Community structure in social and biological networks. Proceedings of the National Academy of Sciences, 99(12), 7821-7826.
- Original Girvan-Newman algorithm
Newman, M. E. J. (2006). Modularity and community structure in networks. Proceedings of the National Academy of Sciences, 103(23), 8577-8582.
- Modularity metric definition

Spectral Dimensions & Fractals

Song, C., Havlin, S., & Makse, H. A. (2005). Self-similarity of complex networks. Nature, 433(7024), 392-395.
- Introduces box-counting for networks
Song, C., Havlin, S., & Makse, H. A. (2006). Origins of fractality in the growth of complex networks. Nature Physics, 2(4), 275-281.
- Fractal scaling in networks
Daqing, L., Kosmidis, K., Bunde, A., & Havlin, S. (2011). Dimension of spatially embedded networks. Nature Physics, 7(6), 481-484.
- Spatial network dimensions
Shanker, O. (2007). Defining dimension of a complex network. Modern Physics Letters B, 21(6), 321-326.
- Spectral dimension formalism

Urban Networks

Boeing, G. (2017). OSMnx: New methods for acquiring, constructing, analyzing, and visualizing complex street networks. Computers, Environment and Urban Systems, 65, 126-139.
- OSMnx library for street networks
Porta, S., Crucitti, P., & Latora, V. (2006). The network analysis of urban streets: A primal approach. Environment and Planning B, 33(5), 705-725.
- Urban street network analysis
Batty, M., & Longley, P. (1994). Fractal Cities: A Geometry of Form and Function. Academic Press.
- Comprehensive book on urban fractals

Software Documentation

Infomap: https://www.mapequation.org/infomap/
OSMnx: https://osmnx.readthedocs.io/
NetworkX: https://networkx.org/
Boost C++: https://www.boost.org/
MATLAB Graph Functions: https://www.mathworks.com/help/matlab/graph-and-network-algorithms.html

Author Notes

This comprehensive analysis suite demonstrates three complementary approaches to understanding urban network structure:

Method 1 (Infomap) reveals functional organization - how Venice is divided into coherent neighborhoods based on connectivity. The information-theoretic foundation provides a principled, efficient method for identifying natural divisions.

Method 2 (Girvan-Newman) reveals hierarchical structure - how communities nest within larger super-communities, showing multi-scale organization. Though computationally expensive, it validates and extends Infomap results.

Method 3 (MBRW) reveals geometric properties - how Venice's structure scales from individual streets to the entire city. This captures the fractal and hierarchical nature of urban growth constrained by canals and geography.

Together, these methods answer:

What are the parts? (Infomap)
How are parts organized hierarchically? (Girvan-Newman)
How do parts fit together across scales? (MBRW)

Why Venice?

Venice provides an ideal test case because:

Unique geography: Canal constraints create interesting topological features
Historical structure: Traditional sestieri (6 districts) provide validation
Multi-island system: Natural hierarchical organization
Well-documented: Extensive historical and geographic data available
Tourism interest: Results have practical applications for navigation and planning

Practical Applications

The methods developed here can be applied to:

Urban planning: Identify natural neighborhood boundaries for administrative purposes
Emergency services: Optimize response routes based on community structure
Tourism management: Design walking tours that respect natural districts
Transportation: Plan public transit routes connecting communities efficiently
Historical analysis: Compare detected communities with historical districts
Infrastructure: Identify critical bridges and connections between communities
Sustainability: Understand pedestrian flow patterns for traffic management

Methodological Contributions

This repository demonstrates:

Multi-method validation: Using three independent approaches for robustness
Scale integration: Connecting local (communities) and global (spectral) properties
Practical implementation: Complete, reproducible workflow from data to results
Computational efficiency: Optimizations for large-scale network analysis
Visualization best practices: Publication-quality figures with clear interpretation

Future Work

Potential extensions and research directions:

Community Detection Extensions

Temporal analysis:
- Compare community structure across different historical periods
- Track how Venice's network evolved over centuries
- Analyze seasonal variations (tourist vs. non-tourist seasons)
Multi-layer networks:
- Integrate different transportation modes (pedestrian, water taxi, gondola)
- Create separate layers for different network types
- Study how communities differ across transportation modes
Validation studies:
- Quantitative comparison with official sestieri boundaries
- Ground-truth validation with local resident surveys
- Historical map analysis for temporal validation
Functional correlation:
- Overlay with socioeconomic data (property values, demographics)
- Correlate with points of interest (churches, museums, shops)
- Analyze tourist flow patterns and their relationship to structure

Algorithmic Extensions

Additional algorithms:
- Louvain method for comparison
- Label Propagation for speed
- Spectral clustering for geometric approach
- Ensemble methods combining multiple algorithms
Parameter optimization:
- Systematic grid search for optimal Markov time
- Stability analysis across parameter ranges
- Information-theoretic criteria for parameter selection
Hierarchical Infomap:
- Enable multi-level hierarchy (currently two-level)
- Compare with Girvan-Newman hierarchies
- Develop methods for automatic scale selection

Spectral Analysis Extensions

Community-specific spectral dimensions:
- Compute Dq separately for each detected community
- Compare geometric properties across neighborhoods
- Identify communities with anomalous scaling
Temporal MBRW:
- Analyze how spectral properties changed historically
- Correlate dimension changes with urban development
- Study impact of new bridges on network geometry
Advanced multifractal analysis:
- Full multifractal spectrum analysis
- Hölder exponent distribution
- Singularity spectrum computation
- Identify phase transitions in network structure

Cross-Network Studies

Comparative urban analysis:
- Apply same pipeline to other canal cities (Amsterdam, Bruges, Stockholm)
- Compare Venice with non-canal cities of similar size
- Identify universal vs. Venice-specific properties
- Build typology of urban network structures
Network motifs:
- Identify recurring structural patterns
- Correlate motifs with community boundaries
- Study relationship between motifs and spectral properties

Practical Applications

Navigation optimization:
- Use community structure for hierarchical routing
- Design shortest-path algorithms respecting communities
- Optimize tourist routes to visit multiple districts efficiently
Critical infrastructure:
- Identify bridges critical for inter-community connectivity
- Analyze vulnerability to edge removal (bridge closure)
- Design resilient network modifications
Accessibility analysis:
- Combine with elevation data (Venice flooding)
- Identify communities most vulnerable to acqua alta
- Design emergency evacuation routes

Integration Studies

Community-spectral correlation:
- Statistical analysis of Dq variation across communities
- Test hypothesis: community boundaries = dimension transitions
- Develop unified theory connecting topology and geometry
Machine learning:
- Train models to predict community membership from local features
- Use spectral properties as features for classification
- Develop automated community detection using learned features
Interactive visualization:
- Web-based interface for exploring results
- 3D visualization with building heights
- Time-slider for temporal analysis
- Overlay with satellite imagery

Theoretical Developments

Analytical models:
- Develop theoretical predictions for Dq in canal-constrained networks
- Mathematical relationship between modularity and multifractality
- Optimal network growth models reproducing Venice-like structure
Scaling theory:
- Finite-size scaling analysis
- Crossover phenomena between scales
- Universal scaling laws for spatial networks

Troubleshooting

Common Issues - Python

Problem: ImportError: No module named 'osmnx'

Solution: pip install osmnx

Problem: Infomap segmentation fault

Solution: Update infomap to latest version
pip install --upgrade infomap

Problem: NetworkX version incompatibility

Solution: Ensure NetworkX >= 2.5
pip install --upgrade networkx

Problem: Visualization plots are blank

Solution: Check matplotlib backend
import matplotlib
matplotlib.use('TkAgg')  # or 'Qt5Agg'

Common Issues - MATLAB

Problem: "Cannot open file: venice.graphml"

Solution: Run Python script first to generate the file
Or check file path in MATLAB script

Problem: Boost library not found during compilation

Solution: Update boost_path variable
boost_path = 'C:\boost_1_89_0';  % Adjust path

Problem: g++ compiler not found

Solution (Windows): Install MinGW-w64
Solution (Mac): Install Xcode command line tools
Solution (Linux): sudo apt-get install build-essential

Problem: Infomap executable fails with version error

Solution: Try different command flags
--undirdir or -u or --undirected
Or use fallback native MATLAB algorithm

Problem: Out of memory during MBRW

Solution: Reduce log_num_steps from 23 to 21
Or close other applications
Or use computer with more RAM

Problem: MBRW results look wrong (Dq > 5 or < 0)

Solution: Check bias and memory parameters
Ensure network is connected (check LCC extraction)
Verify input file format (tab-delimited)

Performance Optimization

Python slow on large networks:

Use PyPy instead of CPython
Enable NetworkX caching
Reduce visualization node size
Sample nodes for preview plots

MATLAB slow on large networks:

Increase C++ optimization: -O3
Use parfor for multiple parameter sets
Reduce number of q-values tested
Use compiled MEX files instead of m-files

Disk space issues:

Compress GraphML files after generation
Delete intermediate Infomap output files
Store only essential results
Use binary formats instead of CSV

Contributing

This is an academic project. Suggestions for improvements:

Additional visualizations
- Interactive plots using Plotly
- 3D network embeddings
- Animated temporal evolution
Code optimizations
- Parallel processing for multiple cities
- GPU acceleration for large networks
- More efficient data structures
Documentation improvements
- Video tutorials
- Jupyter notebooks with examples
- Case studies with other cities
Testing
- Unit tests for core functions
- Validation on synthetic networks
- Continuous integration setup

To contribute:

Fork the repository
Create feature branch
Add tests and documentation
Submit pull request

Citation

If you use this code in your research, please cite:

@software{venice_network_analysis,
  author = {[Your-Name]},
  title = {Venice Street Network Analysis Suite: Community Detection and Spectral Dimension Analysis},
  year = {CURRENT-YEAR},
  url = {[Your Repository URL]}
}

And cite the original methods:

@article{rosvall2008infomap,
  title={Maps of random walks on complex networks reveal community structure},
  author={Rosvall, Martin and Bergstrom, Carl T},
  journal={Proceedings of the National Academy of Sciences},
  volume={105},
  number={4},
  pages={1118--1123},
  year={2008}
}

@article{song2005fractals,
  title={Self-similarity of complex networks},
  author={Song, Chaoming and Havlin, Shlomo and Makse, Hern{\'a}n A},
  journal={Nature},
  volume={433},
  number={7024},
  pages={392--395},
  year={2005}
}

@article{boeing2017osmnx,
  title={OSMnx: New methods for acquiring, constructing, analyzing, and visualizing complex street networks},
  author={Boeing, Geoff},
  journal={Computers, Environment and Urban Systems},
  volume={65},
  pages={126--139},
  year={2017}
}

License

This project is released under the MIT License for academic and research purposes.

Acknowledgments

OpenStreetMap contributors for providing street network data
Martin Rosvall for the Infomap algorithm and software
Geoff Boeing for the OSMnx library
Hernán Makse and collaborators for MBRW methodology
NetworkX development team for network analysis tools
MATLAB for scientific computing platform
Boost C++ Libraries for efficient data structures

Contact & Support

For questions about:

Infomap algorithm: See Infomap documentation
OSMnx library: See OSMnx documentation
MBRW methodology: See cited papers (Song et al., Daqing et al.)
This implementation: Contact course instructor Uri Hershberg or Loay Egbaria : Loayegb@gmail.com (Student for CS).

Bug reports: Include:

Operating system and version
Python/MATLAB version
Library versions (pip list or ver in MATLAB)
Complete error message
Minimal reproducible example

Feature requests: Describe:

Use case and motivation
Expected behavior
Example with other networks
Performance requirements

Version History

Version 1.0 (Initial Release)

Complete Infomap community detection pipeline
Girvan-Newman alternative method
MBRW spectral dimension analysis
Comprehensive visualizations
Full documentation

Planned for Version 1.1:

Jupyter notebook tutorials
Additional validation metrics
Performance optimizations
Interactive visualizations
Cross-city comparison tools

For academic use in network science, urban analytics, and complex systems research. Developed for Vinece Project and researches.

Last updated: 15/11/2025

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
MBRW		MBRW
OutputImages		OutputImages
VeniceNetworkFiles		VeniceNetworkFiles
code		code
README.md		README.md
venice_map.jpeg		venice_map.jpeg

Folders and files

Latest commit

History

Repository files navigation

Venice Network Analysis

Overview

What This Repository Contains

Table of Contents

Requirements & Installation

Python Requirements (Methods 1 & 2)

MATLAB Requirements (Method 3)

External Tools

MATLAB Setup Instructions

Method 1: Network Download & Infomap Community Detection

Overview

Methodology

Network Data Acquisition

Why Venice?

What is Infomap?

Code Workflow

Step 1: Network Data Download

Step 2: Graph Preprocessing

Step 3: Infomap Algorithm Configuration

Step 4: Node ID Mapping

Step 5: Algorithm Execution

Step 6: Results Extraction and Modularity Calculation

Visualizations

Visualization 1: Venice street network

Figure 1: Raw Venice street network downloaded from OpenStreetMap

Visualization 2: Basic Community Map

Figure 2: Complete community structure of Venice street network with Infomap detection

Additional Details

Visualization 3: Multi-Panel Detailed Analysis

Panel A: Community Distribution Map

Panel B: Community Size Distribution

Panel C: Top 6 Largest Communities Highlighted

Panel D: Size vs. Geographic Spread Scatter Plot

Figure 3: Four-panel detailed analysis of community structure

Panel A – Overall Community Structure :

Panel B – Distribution of Nodes per Community :

Panel C – Spatial Layout of the Six Largest Communities :

Panel D – Relationship Between Geographic Size and Number of Intersections :

Visualization 4: Individual Community Analysis

Figure 4: Individual visualizations of the six largest communities

Visualization 5: Density Heatmap Analysis

Left Panel: Community Membership

Right Panel: Geographic Density

Figure 5: Community membership and geographic density comparison

Output Files

1. venice.graphml

2. venice_infomap_communities.csv

3. venice_infomap_community_stats.csv

Community Statistics Output

Method 2: Girvan-Newman Community Detection

Overview

Methodology

What is Girvan-Newman?

Hierarchical Output

Comparison with Infomap

Code Implementation

Step 1: Hierarchical Community Generation

Step 2: Modularity Tracking

Step 3: Best Level Selection

Visualizations Generated

1. Main Community Map

2. Multi-Panel Detailed Analysis (2×2 grid)

3. Individual Community Views (2×3 grid)

Detailed Community Analysis

Output Files

1. venice_girvan_newman_communities.csv

2. venice_girvan_newman_community_stats.csv

3. venice_girvan_newman_hierarchy.csv

When to Use Girvan-Newman

Enabling the Algorithm

Method 3: MBRW Spectral Dimension Analysis

Overview

Methodology

What is Memory-Biased Random Walk?

What is Spectral Dimension?

Why This Matters for Venice

1. `venice.graphml`

2. `venice_infomap_communities.csv`

3. `venice_infomap_community_stats.csv`

1. `venice_girvan_newman_communities.csv`

2. `venice_girvan_newman_community_stats.csv`

3. `venice_girvan_newman_hierarchy.csv`

1. `venice_2core_lcc.txt`

2. `venice_infomap_communities.csv`

3. `segment_mass_log_multimeans/venice_b_10000_m_5_r_0_c_23.csv`

4. `orders.txt`