Skip to content

rivkalipko/touchestats

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

USA Fencing National Tournament Dataset

This dataset contains fencing competition results scraped from usfencingresults.org, including pool bouts, direct elimination bouts, and referee assignments from US Fencing national events (NACs, Summer Nationals, Junior Olympics, SJCCs). A mirror is available at https://www.kaggle.com/datasets/rivkalipkovitz/usa-fencing-national-tournament-dataset.

Quick Start

Option 1: Use the pre-exported CSV files

The CSV files are ready to use in the data/ folder:

  • data/ts_tournaments.csv - Tournament metadata
  • data/ts_events.csv - Individual events
  • data/ts_referees.csv - Referee information
  • data/ts_event_entries.csv - Fencer results per event
  • data/ts_pools.csv - Pool metadata
  • data/ts_pool_bouts.csv - Individual pool bout scores
  • data/ts_de_bouts.csv - Direct elimination bout scores

Option 2: Generate the data from scratch

The pickle file (usfr_data.p) is too large to include in the repository, so you'll need to generate it yourself:

# Install dependencies
pip install requests beautifulsoup4 pandas

# Run the scraper (takes ~30-60 minutes)
python usfencingresults.py

# Generate CSVs from the pickle file
python export_usfr_to_csv.py

The scraper creates usfr_data.p, a pickle file containing all the scraped data. Once you have the pickle file, you can regenerate the CSVs at any time by re-running export_usfr_to_csv.py.

Note: The scraper outputs a pickle file rather than CSVs directly - this is a bit janky, but I didn't know how to work with CSVs when I originally wrote the code in 2023. The export_usfr_to_csv.py script handles the conversion.


CSV File Descriptions

Tournaments

File: data/ts_tournaments.csv

Column Description
tournament_id Unique identifier
name Tournament name (e.g., "OCT NAC", "SN", "JO")
year Year
month Month

Events

File: data/ts_events.csv

Column Description
event_id Unique identifier
tournament_id Foreign key to tournaments
date Event date
year, month, day Date components
time, time_half Start time (e.g., "8:00", "AM")
category Event type: "Div I", "Div II", "Junior", "Cadet", "Y-14", etc.
gender "M" (Men's), "F" (Women's), or "X" (Mixed)
weapon "Foil", "Epee", or "Saber"
url Original results page URL
num_fencers Total fencers in event
num_pools Number of pools
num_pool_bouts Total pool bouts
num_de_bouts Total DE bouts

Referees

File: data/ts_referees.csv

Column Description
referee_id Unique identifier
name Referee's name (lowercase)
division US Fencing division
club Club affiliation
country Country code

Event Entries

File: data/ts_event_entries.csv

One row per fencer per event with results and pool statistics.

Column Description
event_id Foreign key to events
person_id Fencer identifier (based on name + division + country)
name Fencer's name (lowercase)
division US Fencing division (e.g., "Metro NYC", "Southern CA")
country Country code (e.g., "USA", "CAN")
club1, club2 Club affiliations
result Final placement (1, 2, 3, etc.)
rating Fencing rating (A, B, C, D, E, or U for unrated)
rank National ranking

Pool Statistics (per round r0-r3):

Column Description
seed_r0, seed_r1, ... Seeding going into each round
victories_r0, ... Number of victories
victory_pct_r0, ... Victory percentage (0.0 to 1.0)
touches_scored_r0, ... Total touches scored
touches_received_r0, ... Total touches received
indicator_r0, ... Indicator (scored - received)
status_r0, ... "Advanced", "Eliminated", etc.

Pools

File: data/ts_pools.csv

Column Description
pool_id Unique identifier
event_id Foreign key to events
pool_number Pool number within the round
round Pool round (0 = first round)
num_fencers Fencers in pool
num_referees Referees assigned
referee1_id, referee1_name First referee
referee2_id, referee2_name Second referee (if present)

Note: To find fencers in a pool, query ts_pool_bouts.csv by pool_id.


Pool Bouts

File: data/ts_pool_bouts.csv

Column Description
bout_id Unique identifier
event_id Foreign key to events
pool_id Foreign key to pools
pool_number Pool number
round Pool round
bout_num Bout number in pool
fencer1_id, fencer1_name First fencer
fencer2_id, fencer2_name Second fencer
score1, score2 Bout scores
winner Winner's name
num_referees Number of referees
referee1_id, referee1_name First referee
referee2_id, referee2_name Second referee (if present)
referee3_id, referee3_name Third referee (if present)
referee4_id, referee4_name Fourth referee (if present)

Pool bouts are typically to 5 touches.


DE Bouts

File: data/ts_de_bouts.csv

Column Description
bout_id Unique identifier
event_id Foreign key to events
table Tableau/bracket number (0-indexed, increases each round)
table_of Traditional DE size (128, 64, 32, 16, 8, 4, 2)
round DE round
fencer1_id, fencer1_name First fencer
fencer2_id, fencer2_name Second fencer
score1, score2 Bout scores
winner Winner's name
num_referees Number of referees
referee1_id through referee4_name Referee info

DE bouts are typically to 15 touches.

Example: In an event with 128 fencers in DE:

  • table=0, table_of=128 (first round, table of 128)
  • table=1, table_of=64 (second round, table of 64)
  • table=2, table_of=32 (table of 32)
  • ... continuing to table_of=2 (finals)

Data Relationships

tournaments
    └── events (via tournament_id)
            ├── event_entries (via event_id)
            ├── pools (via event_id)
            │       └── referees (via referee*_id)
            ├── pool_bouts (via event_id, pool_id)
            │       └── referees (via referee*_id)
            └── de_bouts (via event_id)
                    └── referees (via referee*_id)

Data Coverage

  • Years: 2016-2020 seasons
  • Events: NACs, Summer Nationals, Junior Olympics, SJCCs
  • Categories: Div I, Div II, Junior, Cadet, Y-14, Y-12, etc.

Record Counts

File Records
ts_tournaments.csv 34
ts_events.csv 1,089
ts_referees.csv 1,319
ts_event_entries.csv 98,274
ts_pools.csv 14,714
ts_pool_bouts.csv 290,172
ts_de_bouts.csv 84,371

Fencing Terminology

  • Pool Round: Round-robin phase where fencers compete against everyone in their pool
  • DE (Direct Elimination): Single-elimination bracket after pools
  • Indicator: Touches scored minus touches received
  • Rating: Skill classification (A=highest through E, U=unrated)
  • Division: Geographic region within USA Fencing
  • NAC: North American Cup
  • SN: Summer Nationals
  • JO: Junior Olympics
  • SJCC: Summer Junior/Cadet Circuit

Notes

Indexing

  • All indices are 0-based (pool rounds, table numbers, etc. start at 0)
  • The round field follows FencingTimeLive's round numbering, but 0-indexed:
    • Round 0 = first round of pools (FTL "Round 1")
    • Round 1 = first round of DEs (FTL "Round 2")
    • Round 2 = second round of pools (FTL "Round 3") - if applicable
  • DE table is also 0-indexed: table 0 = first DE round, table 1 = second DE round, etc.

Person IDs

  • Person IDs are generated from name + division + country at the time of the event
  • The same fencer will have different person_ids if their division or country changed between events
  • This is intentional - it preserves the fencer's affiliation at the time of each competition
  • To track a fencer across events, match on the name column

Other Notes

  • All names are lowercased for consistency
  • Empty cells indicate missing/unavailable data
  • Most events only have round 0 pool data; rounds 1-3 are for large events with multiple pool rounds

About

Comprehensive data from USA Fencing national tournaments 2016-2020

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages