Name: ASR &amp; Diarization Fundamentals Bootcamp for Media Archives
SKU: 162b96c498ef
Availability: InStock

Description

Description
Learn the building blocks of media archive search with automatic speech recognition (ASR) and speaker diarization. You will understand how audio is segmented, how voice activity detection finds spoken regions, and why chunk size matters for retrieval. We explain the difference between offline batch transcription and streaming modes, and when each approach is appropriate. Hands-on labs show how to align words with timestamps, generate sentence-level segments, and attach speaker labels reliably. You will practice turning raw ASR output into searchable JSON with timecodes, confidence scores, and topic tags. A troubleshooting module tackles crosstalk, background music, and clipping so your transcripts remain usable. We cover ethical considerations, including consent, disclosure, and expectations for sensitive recordings. A final walkthrough demonstrates wiring transcripts into a search index with snippet previews and jump-to-time links. By the end, you will have a working baseline pipeline that can process historic and newly ingested media. You will also know how to measure quality with WER/DER and how to plan the next round of improvements.
Format
Video lessons, lab notebooks (Jupyter), JSON schema templates, sample audio set, evaluation worksheets
Duration
4 hours self-paced + optional labs
What You’ll Learn
– VAD, segmentation, and chunking
– ASR alignment & timestamps
– Basic diarization labeling
– JSON schema for search
– Snippet generation & jump links
– Intro to WER/DER metrics
Target Audience
Archivists, media librarians, data engineers, and product teams starting an audio/video search initiative

Videos + Jupyter labs + JSON templates + sample audio + eval sheets

4 hours

– VAD/segmentation
– ASR timestamps
– Diarization basics
– Searchable JSON
– Snippet links
– WER/DER intro

Archivists, media librarians, data engineers, product teams