Transcribe or translate audio from microphone, file, or YouTube
Transcribe audio to text with speaker diarization