This project provides an automated audio transcription pipeline using Azure services. Audio files are retrieved using youtube-dl, in an m4a format.
- Upload audio files to Azure Blob Storage, automatically transcribe them using Azure Speech Service with optional speaker diarization, and load transcripts into Azure AI Foundry agents for interactive Q&A.
- Get transcripts and then use those to populate the Researcher Agent to get a comprehensive overview.
┌─────────────┐ ┌──────────────────┐ ┌─────────────────┐
│ Audio File │────▶│ Blob Trigger │────▶│ Speech Service │
│ (+ metadata)│ │ (Function) │ │ (Transcription) │
└─────────────┘ └──────────────────┘ └─────────────────┘
│
destinationContainerUrl │
(Speech writes JSON directly) │
▼
┌─────────────┐ ┌──────────────────┐ ┌─────────────────┐
│ AI Agent │◀────│ Blob Trigger │◀────│ Transcript JSON │
│ (Q&A ready) │ │ (Function) │ │ + .txt file │
└─────────────┘ └──────────────────┘ └─────────────────┘
- Local Development: Uses polling to wait for transcription completion
- Azure Deployment: Uses
destinationContainerUrl- Speech Service writes JSON directly to the transcripts container - The blob trigger parses the JSON, extracts the transcript text, and saves a
.txtfile with the original audio filename - The
.txtfile is then uploaded to Azure AI Foundry for agent-based Q&A
- Azure Speech Service supports: WAV MP3 OGG/OPUS FLAC AMR WEBM (NOT m4a - use
make convert-audio) - Transcripts are saved as
.txtfiles with the same name as the source audio file. - diarization:
trueenables speaker separation,falsefor single speaker - topic: Groups transcripts under the same AI agent (e.g., "project-planning")
Run make help to see all available commands.