metadata

base_model: openai/whisper-large-v3-turbo
datasets:
  - bn
language: bn
library_name: transformers
license: apache-2.0
model-index:
  - name: Finetuned openai/whisper-large-v3-turbo on Bengali
    results:
      - task:
          type: automatic-speech-recognition
          name: Speech-to-Text
        dataset:
          name: Common Voice (Bengali)
          type: common_voice
        metrics:
          - type: wer
            value: 11.053

Finetuned openai/whisper-large-v3-turbo on 21409 Bengali training audio samples from cv-corpus-21.0-2025-03-14/bn.

This model was created from the Mozilla.ai Blueprint: speech-to-text-finetune.

Evaluation results on 9363 audio samples of Bengali:

Baseline model (before finetuning) on Bengali

Word Error Rate (Normalized): 78.843
Word Error Rate (Orthographic): 107.027
Character Error Rate (Normalized): 62.521
Character Error Rate (Orthographic): 72.012
Loss: 1.074

Finetuned model (after finetuning) on Bengali

Word Error Rate (Normalized): 11.053
Word Error Rate (Orthographic): 26.436
Character Error Rate (Normalized): 6.059
Character Error Rate (Orthographic): 7.537
Loss: 0.109