A simple, self-hostable API for resolving messy merchant strings (e.g., from bank statements) to canonical merchant objects.
Given a messy transaction descriptor like UBER * EATS PENDING, the API returns a structured merchant object:
{
"merchant_id": "uber",
"name": "Uber",
"category": "TRANSPORT",
"subcategory": "RIDE_SHARING",
"merchant_channel": "HYBRID",
"merchant_scope": "GLOBAL",
"subscription_likely": false,
"is_marketplace": true,
"risk_level": "LOW",
"website": "https://uber.com",
"logo_hint": {
"type": "DOMAIN",
"value": "uber.com"
},
"confidence": 0.85
}Note: Products (Uber Eats, Uber Lime, etc.) resolve to the parent brand (Uber). There is one canonical merchant per brand.
- Node.js 22+
npm installDevelopment:
npm run devProduction:
npm run build
npm startThe server listens on port 3000 by default. Set the PORT environment variable to change it.
GET /v1/merchant/resolve?q=<string>
Parameters:
q(required): The merchant string to resolve
Response (match found):
{
"merchant_id": "uber",
"name": "Uber",
"category": "TRANSPORT",
"subcategory": "RIDE_SHARING",
"merchant_channel": "HYBRID",
"merchant_scope": "GLOBAL",
"subscription_likely": false,
"is_marketplace": true,
"risk_level": "LOW",
"website": "https://uber.com",
"logo_hint": {
"type": "DOMAIN",
"value": "uber.com"
},
"confidence": 0.96
}Response (no match / low confidence):
{
"merchant_id": null,
"confidence": 0.42
}GET /health
Returns { "status": "ok", "merchants": <count> }
Each brand gets exactly one JSON file. Do NOT create separate files for products or sub-brands:
uber.json- includes aliases for Uber, Uber Eats, Uber Lime, etc.amazon.json- includes aliases for Amazon, Amazon Prime, Amazon Fresh, etc.
Product differentiation is handled via aliases and regex patterns, not separate merchant files.
Create a new JSON file in src/data/merchants/:
{
"id": "acme",
"name": "ACME Corp",
"category": "SHOPPING",
"subcategory": "E_COMMERCE",
"merchant_channel": "ONLINE",
"merchant_scope": "NATIONAL",
"subscription_likely": false,
"is_marketplace": false,
"risk_level": "LOW",
"aliases": [
"acme",
"acme corp",
"acme corporation"
],
"regex": [
"^acme"
],
"website": "https://acme.com",
"logo_hint": {
"type": "DOMAIN",
"value": "acme.com"
}
}| Field | Required | Type | Description |
|---|---|---|---|
id |
Yes | string | Unique identifier (lowercase, underscores) |
name |
Yes | string | Display name |
category |
Yes | enum | See categories below |
subcategory |
Yes | enum | See subcategories below |
merchant_channel |
Yes | enum | PHYSICAL, ONLINE, HYBRID |
merchant_scope |
Yes | enum | LOCAL, NATIONAL, GLOBAL |
subscription_likely |
Yes | boolean | Whether this merchant typically charges subscriptions |
is_marketplace |
Yes | boolean | Whether this is a marketplace (multiple sellers) |
risk_level |
Yes | enum | LOW, MEDIUM, HIGH |
aliases |
Yes | string[] | Pre-normalized strings to match against |
regex |
No | string[] | Regex patterns to match against normalized input |
website |
No | string | Official website URL |
logo_hint |
No | object | { type: "DOMAIN", value: "domain.com" } |
Aliases must be pre-normalized:
- Lowercase only (no uppercase)
- No punctuation (spaces allowed)
- No diacritics or special characters
- No noise words (pending, ref, transaction, etc.)
Examples:
"uber eats"(correct)"Uber Eats"(wrong - uppercase)"netflix.com"(wrong - punctuation)
Regex patterns should identify brand families, not specific products:
"^uber"(correct - matches all Uber products)"^amazon"(correct - matches all Amazon products)"uber eats"(wrong - too specific, use aliases instead)
ATM- ATM transactionsAUTOMOTIVE- Gas stations, car washes, repairs, partsDIGITAL_SERVICES- Software, cloud servicesEDUCATION- Tuition, courses, booksENTERTAINMENT- Streaming, gaming, movies, musicFEES_AND_CHARGES- Bank fees, service feesFINANCE- Banking, investments, loansFOOD_AND_DRINK- Cafes, restaurants, bars, fast foodGIFTS_AND_DONATIONS- Charity, giftsGOVERNMENT- Taxes, licenses, finesGROCERIES- Supermarkets, convenience storesHEALTHCARE- Hospitals, dental, vision, pharmacyHOME- Improvement, furniture, appliances, rentINSURANCE- Auto, health, home, life insurancePERSONAL_CARE- Salons, spas, cosmeticsPET_CARE- Veterinary, pet supplies, groomingSERVICES- Professional, household, legalSHOPPING- E-commerce, department stores, clothingSPORTS_AND_FITNESS- Gyms, equipment, eventsTRANSPORT- Ride-sharing, taxis, public transportTRAVEL- Hotels, flights, vacation rentalsUTILITIES- Electricity, gas, water, internet, phone
PHYSICAL- Brick-and-mortar onlyONLINE- Online onlyHYBRID- Both physical and online
LOCAL- Single city/regionNATIONAL- Single countryGLOBAL- Multiple countries
LOW- Well-known, reputable merchantMEDIUM- Less established or nicheHIGH- Higher fraud risk category
- Input is normalized: Unicode normalization, diacritic folding, lowercase, removal of noise words (pending, ref, transaction, etc.)
- Each merchant's regex patterns are tested against the normalized input
- Each merchant's aliases are compared using token similarity
- A confidence score is calculated
- If confidence >= 0.70, the best match is returned
- Otherwise, a no-match response with the best confidence is returned
See CONTRIBUTING.md for guidelines on adding merchants and submitting changes.
Free to use, modify, and self-host. No commercial use. See LICENSE.