Skip to content

Anishrkhadka/fsm-visit-outcome-prediction

Repository files navigation

FSM Visit Outcome Model

This repository implements an end-to-end machine learning pipeline for predicting the outcome of Field Service Management (FSM) visits. It transforms raw operational and network data into structured modelling tables, trains a binary classifier to predict FAIL vs SUCCESS, and provides local, human-readable explanations for each prediction. The pipeline is packaged as a reusable Python module (fsm_model) alongside notebooks for exploration and experimentation.

1. Installation

From the project root:

pip install -e .

This installs the fsm_model package in editable mode and pulls in the dependencies listed in pyproject.toml / requirements.txt.

Python 3.10+ is recommended.

2. Data Expectations

The package assumes the original challenge data is present under data/raw/:

  • data/raw/visits.txt – JSONL with one visit per line.
  • data/raw/network.adjlist – undirected adjacency list of the network graph.

The full pipeline will then create:

  • data/bronze/ – lightly cleaned but minimally engineered data.
  • data/silver/ – visit-level and node-level features.
  • data/gold/ – final modelling tables and TF–IDF artefacts.

3. End-to-End Pipeline

Once installed, the full pipeline can be run from Python using the high-level API.

import fsm_model as fsm

# 1. Build Bronze layer (raw -> bronze)
fsm.build_bronze(
    raw_visits_path="data/raw/visits.txt",
    raw_network_path="data/raw/network.adjlist",
    bronze_dir="data/bronze",
)

# 2. Build Silver layer (bronze -> silver)
fsm.build_silver_layer()   # reads from data/bronze, writes to data/silver

# 3. Build Gold layer (silver -> gold)
fsm.build_gold_layer()     # reads from data/silver, writes to data/gold

# 4. Train LightGBM model (train / val / test)
summary = fsm.run_training(
    gold_data="data/gold",           # or "data/gold/visits_gold_final.parquet"
    model_dir="models/lightgbm_v3",
)
print(summary["val_metrics"]["roc_auc"], summary["test_metrics"]["roc_auc"])

# 5. Batch inference on Gold table
pred_df = fsm.predict_from_gold(
    gold_data="data/gold",           # or a DataFrame
    model_dir="models/lightgbm_v3",
)
print(pred_df.head())

# 6. Local interpretation for a single visit
exp = fsm.explain_visit_from_gold(
    gold_data="data/gold",
    model_dir="models/lightgbm_v3",
    task_id="TASK0",
    visit_id=0,
    top_k=5,
)
print(exp["summary_text"])

These steps mirror the notebook workflow but via reusable functions.


4. Package Structure (fsm_model)

The main public modules and entry points are:

  • fsm_model.data_pipeline
    • build_bronze(raw_visits_path, raw_network_path, bronze_dir="data/bronze")
    • load_bronze(bronze_dir="data/bronze")
  • fsm_model.feature_pipeline
    • build_silver_layer(bronze_visits_path, bronze_network_path) – creates data/silver/visits_silver.parquet and data/silver/network_silver.parquet.
    • build_gold_layer(visits_silver_path, network_graph_silver_path) – creates data/gold/visits_gold_final.parquet and related artefacts.
  • fsm_model.training
    • run_training(gold_data, model_dir="models/lightgbm_v2", ...) – time-ordered split, LightGBM training, evaluation, and model persistence.
    • time_ordered_split, train_lightgbm, evaluate_model – lower-level helpers.
  • fsm_model.inference
    • load_model_bundle(model_dir) – load model, feature list, and risk quantiles.
    • prepare_features_from_gold(gold_data, feature_cols)
    • predict_from_gold(gold_data, model_dir, id_cols=None, threshold=0.5)
  • fsm_model.interpretation
    • explain_visit_from_gold(gold_data, model_dir, task_id, visit_id, ...) – SHAP-based explanation for a single visit.
    • summarise_explanation_text(explanation) – converts SHAP output into a short textual explanation.

The lower-level ingestion and utility helpers live in:

  • fsm_model.data_ingestion – raw loaders and bronze I/O.
  • fsm_model.utils, fsm_model.eda_utils – notebook plotting and display helpers.

5. Notebooks

The notebooks in the project root demonstrate and validate the pipeline:

  • 01-data-ingestion.ipynb – raw -> bronze.
  • 02-EDA.ipynb – exploratory analysis on bronze.
  • 03-feature-silver-layer.ipynb – silver feature engineering (earlier version of the code now in fsm_model.feature_pipeline).
  • 04-feature-gold-layer.ipynb – gold feature engineering (earlier version of the code now in fsm_model.feature_pipeline).
  • 05-modeling.ipynb / 06-model-inter.ipynb – modelling and interpretability experiments using the packaged functions.

You can use these notebooks as templates for further experimentation or for building plots and tables for your presentation.


6. Typical Usage Patterns

Train a fresh model from raw data

  1. Drop raw files under data/raw/.
  2. Run a small script or notebook that calls:
import fsm_model as fsm

fsm.build_bronze()
fsm.build_silver_layer()
fsm.build_gold_layer()
summary = fsm.run_training("data/gold", model_dir="models/lightgbm_v3")

Score and explain visits using an existing model

import fsm_model as fsm
import pandas as pd

visits_gold = pd.read_parquet("data/gold/visits_gold_final.parquet")
pred = fsm.predict_from_gold(visits_gold, model_dir="models/lightgbm_v3")

exp = fsm.explain_visit_from_gold(
    visits_gold,
    model_dir="models/lightgbm_v3",
    task_id="TASK123",
    visit_id=2,
    top_k=5,
)
print(exp["summary_text"])

About

End-to-end machine learning pipeline for predicting field service visit failures with LightGBM and SHAP-based interpretability.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors