This repository implements an end-to-end machine learning pipeline for predicting the outcome of Field Service Management (FSM) visits. It transforms raw operational and network data into structured modelling tables, trains a binary classifier to predict FAIL vs SUCCESS, and provides local, human-readable explanations for each prediction. The pipeline is packaged as a reusable Python module (fsm_model) alongside notebooks for exploration and experimentation.
From the project root:
pip install -e .This installs the fsm_model package in editable mode and pulls in the dependencies listed in pyproject.toml / requirements.txt.
Python 3.10+ is recommended.
The package assumes the original challenge data is present under data/raw/:
data/raw/visits.txt– JSONL with one visit per line.data/raw/network.adjlist– undirected adjacency list of the network graph.
The full pipeline will then create:
data/bronze/– lightly cleaned but minimally engineered data.data/silver/– visit-level and node-level features.data/gold/– final modelling tables and TF–IDF artefacts.
Once installed, the full pipeline can be run from Python using the high-level API.
import fsm_model as fsm
# 1. Build Bronze layer (raw -> bronze)
fsm.build_bronze(
raw_visits_path="data/raw/visits.txt",
raw_network_path="data/raw/network.adjlist",
bronze_dir="data/bronze",
)
# 2. Build Silver layer (bronze -> silver)
fsm.build_silver_layer() # reads from data/bronze, writes to data/silver
# 3. Build Gold layer (silver -> gold)
fsm.build_gold_layer() # reads from data/silver, writes to data/gold
# 4. Train LightGBM model (train / val / test)
summary = fsm.run_training(
gold_data="data/gold", # or "data/gold/visits_gold_final.parquet"
model_dir="models/lightgbm_v3",
)
print(summary["val_metrics"]["roc_auc"], summary["test_metrics"]["roc_auc"])
# 5. Batch inference on Gold table
pred_df = fsm.predict_from_gold(
gold_data="data/gold", # or a DataFrame
model_dir="models/lightgbm_v3",
)
print(pred_df.head())
# 6. Local interpretation for a single visit
exp = fsm.explain_visit_from_gold(
gold_data="data/gold",
model_dir="models/lightgbm_v3",
task_id="TASK0",
visit_id=0,
top_k=5,
)
print(exp["summary_text"])These steps mirror the notebook workflow but via reusable functions.
The main public modules and entry points are:
fsm_model.data_pipelinebuild_bronze(raw_visits_path, raw_network_path, bronze_dir="data/bronze")load_bronze(bronze_dir="data/bronze")
fsm_model.feature_pipelinebuild_silver_layer(bronze_visits_path, bronze_network_path)– createsdata/silver/visits_silver.parquetanddata/silver/network_silver.parquet.build_gold_layer(visits_silver_path, network_graph_silver_path)– createsdata/gold/visits_gold_final.parquetand related artefacts.
fsm_model.trainingrun_training(gold_data, model_dir="models/lightgbm_v2", ...)– time-ordered split, LightGBM training, evaluation, and model persistence.time_ordered_split,train_lightgbm,evaluate_model– lower-level helpers.
fsm_model.inferenceload_model_bundle(model_dir)– load model, feature list, and risk quantiles.prepare_features_from_gold(gold_data, feature_cols)predict_from_gold(gold_data, model_dir, id_cols=None, threshold=0.5)
fsm_model.interpretationexplain_visit_from_gold(gold_data, model_dir, task_id, visit_id, ...)– SHAP-based explanation for a single visit.summarise_explanation_text(explanation)– converts SHAP output into a short textual explanation.
The lower-level ingestion and utility helpers live in:
fsm_model.data_ingestion– raw loaders and bronze I/O.fsm_model.utils,fsm_model.eda_utils– notebook plotting and display helpers.
The notebooks in the project root demonstrate and validate the pipeline:
01-data-ingestion.ipynb– raw -> bronze.02-EDA.ipynb– exploratory analysis on bronze.03-feature-silver-layer.ipynb– silver feature engineering (earlier version of the code now infsm_model.feature_pipeline).04-feature-gold-layer.ipynb– gold feature engineering (earlier version of the code now infsm_model.feature_pipeline).05-modeling.ipynb/06-model-inter.ipynb– modelling and interpretability experiments using the packaged functions.
You can use these notebooks as templates for further experimentation or for building plots and tables for your presentation.
- Drop raw files under
data/raw/. - Run a small script or notebook that calls:
import fsm_model as fsm
fsm.build_bronze()
fsm.build_silver_layer()
fsm.build_gold_layer()
summary = fsm.run_training("data/gold", model_dir="models/lightgbm_v3")import fsm_model as fsm
import pandas as pd
visits_gold = pd.read_parquet("data/gold/visits_gold_final.parquet")
pred = fsm.predict_from_gold(visits_gold, model_dir="models/lightgbm_v3")
exp = fsm.explain_visit_from_gold(
visits_gold,
model_dir="models/lightgbm_v3",
task_id="TASK123",
visit_id=2,
top_k=5,
)
print(exp["summary_text"])