FSM Visit Outcome Model

This repository implements an end-to-end machine learning pipeline for predicting the outcome of Field Service Management (FSM) visits. It transforms raw operational and network data into structured modelling tables, trains a binary classifier to predict FAIL vs SUCCESS, and provides local, human-readable explanations for each prediction. The pipeline is packaged as a reusable Python module (fsm_model) alongside notebooks for exploration and experimentation.

1. Installation

From the project root:

pip install -e .

This installs the fsm_model package in editable mode and pulls in the dependencies listed in pyproject.toml / requirements.txt.

Python 3.10+ is recommended.

2. Data Expectations

The package assumes the original challenge data is present under data/raw/:

data/raw/visits.txt – JSONL with one visit per line.
data/raw/network.adjlist – undirected adjacency list of the network graph.

The full pipeline will then create:

data/bronze/ – lightly cleaned but minimally engineered data.
data/silver/ – visit-level and node-level features.
data/gold/ – final modelling tables and TF–IDF artefacts.

3. End-to-End Pipeline

Once installed, the full pipeline can be run from Python using the high-level API.

import fsm_model as fsm

# 1. Build Bronze layer (raw -> bronze)
fsm.build_bronze(
    raw_visits_path="data/raw/visits.txt",
    raw_network_path="data/raw/network.adjlist",
    bronze_dir="data/bronze",
)

# 2. Build Silver layer (bronze -> silver)
fsm.build_silver_layer()   # reads from data/bronze, writes to data/silver

# 3. Build Gold layer (silver -> gold)
fsm.build_gold_layer()     # reads from data/silver, writes to data/gold

# 4. Train LightGBM model (train / val / test)
summary = fsm.run_training(
    gold_data="data/gold",           # or "data/gold/visits_gold_final.parquet"
    model_dir="models/lightgbm_v3",
)
print(summary["val_metrics"]["roc_auc"], summary["test_metrics"]["roc_auc"])

# 5. Batch inference on Gold table
pred_df = fsm.predict_from_gold(
    gold_data="data/gold",           # or a DataFrame
    model_dir="models/lightgbm_v3",
)
print(pred_df.head())

# 6. Local interpretation for a single visit
exp = fsm.explain_visit_from_gold(
    gold_data="data/gold",
    model_dir="models/lightgbm_v3",
    task_id="TASK0",
    visit_id=0,
    top_k=5,
)
print(exp["summary_text"])

These steps mirror the notebook workflow but via reusable functions.

4. Package Structure (`fsm_model`)

The main public modules and entry points are:

fsm_model.data_pipeline
- build_bronze(raw_visits_path, raw_network_path, bronze_dir="data/bronze")
- load_bronze(bronze_dir="data/bronze")
fsm_model.feature_pipeline
- build_silver_layer(bronze_visits_path, bronze_network_path) – creates data/silver/visits_silver.parquet and data/silver/network_silver.parquet.
- build_gold_layer(visits_silver_path, network_graph_silver_path) – creates data/gold/visits_gold_final.parquet and related artefacts.
fsm_model.training
- run_training(gold_data, model_dir="models/lightgbm_v2", ...) – time-ordered split, LightGBM training, evaluation, and model persistence.
- time_ordered_split, train_lightgbm, evaluate_model – lower-level helpers.
fsm_model.inference
- load_model_bundle(model_dir) – load model, feature list, and risk quantiles.
- prepare_features_from_gold(gold_data, feature_cols)
- predict_from_gold(gold_data, model_dir, id_cols=None, threshold=0.5)
fsm_model.interpretation
- explain_visit_from_gold(gold_data, model_dir, task_id, visit_id, ...) – SHAP-based explanation for a single visit.
- summarise_explanation_text(explanation) – converts SHAP output into a short textual explanation.

The lower-level ingestion and utility helpers live in:

fsm_model.data_ingestion – raw loaders and bronze I/O.
fsm_model.utils, fsm_model.eda_utils – notebook plotting and display helpers.

5. Notebooks

The notebooks in the project root demonstrate and validate the pipeline:

01-data-ingestion.ipynb – raw -> bronze.
02-EDA.ipynb – exploratory analysis on bronze.
03-feature-silver-layer.ipynb – silver feature engineering (earlier version of the code now in fsm_model.feature_pipeline).
04-feature-gold-layer.ipynb – gold feature engineering (earlier version of the code now in fsm_model.feature_pipeline).
05-modeling.ipynb / 06-model-inter.ipynb – modelling and interpretability experiments using the packaged functions.

You can use these notebooks as templates for further experimentation or for building plots and tables for your presentation.

6. Typical Usage Patterns

Train a fresh model from raw data

Drop raw files under data/raw/.
Run a small script or notebook that calls:

import fsm_model as fsm

fsm.build_bronze()
fsm.build_silver_layer()
fsm.build_gold_layer()
summary = fsm.run_training("data/gold", model_dir="models/lightgbm_v3")

Score and explain visits using an existing model

import fsm_model as fsm
import pandas as pd

visits_gold = pd.read_parquet("data/gold/visits_gold_final.parquet")
pred = fsm.predict_from_gold(visits_gold, model_dir="models/lightgbm_v3")

exp = fsm.explain_visit_from_gold(
    visits_gold,
    model_dir="models/lightgbm_v3",
    task_id="TASK123",
    visit_id=2,
    top_k=5,
)
print(exp["summary_text"])

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
src/fsm_model		src/fsm_model
.gitignore		.gitignore
01-data-ingestion.ipynb		01-data-ingestion.ipynb
02-EDA.ipynb		02-EDA.ipynb
03-feature-silver-layer.ipynb		03-feature-silver-layer.ipynb
04-feature-gold-layer.ipynb		04-feature-gold-layer.ipynb
05-modeling.ipynb		05-modeling.ipynb
README.md		README.md
pipeline-demo.ipynb		pipeline-demo.ipynb
pyproject.toml		pyproject.toml
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FSM Visit Outcome Model

1. Installation

2. Data Expectations

3. End-to-End Pipeline

4. Package Structure (`fsm_model`)

5. Notebooks

6. Typical Usage Patterns

Train a fresh model from raw data

Score and explain visits using an existing model

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

FSM Visit Outcome Model

1. Installation

2. Data Expectations

3. End-to-End Pipeline

4. Package Structure (fsm_model)

5. Notebooks

6. Typical Usage Patterns

Train a fresh model from raw data

Score and explain visits using an existing model

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

4. Package Structure (`fsm_model`)

Packages