metadata
license: apache-2.0
language:
- th
- en
base_model:
- openai/whisper-large-v3
pipeline_tag: automatic-speech-recognition
library_name: transformers
metrics:
- wer
Pathumma Whisper Large V3 (TH)
More information needed
Model Description
More information needed
Quickstart
More information needed
Evaluation Performance
Model | WER (CV18) | WER (Gowejee) | WER (LOTUS-TRD) | WER (Thai Dialect) | WER (Elderly) | WER (Gigaspeech2) | WER (Fleurs) | WER (Distant Meeting) | WER (Podcast) |
---|---|---|---|---|---|---|---|---|---|
whisper-large-v3 | 18.75 | 46.59 | 48.14 | 57.82 | 12.27 | 33.26 | 24.08 | 72.57 | 41.24 |
airesearch-wav2vec2-large-xlsr-53-th | 8.49 | 17.28 | 63.01 | 48.53 | 11.29 | 52.72 | 37.32 | 85.11 | 65.12 |
thonburian-whisper-th-large-v3-combined | 7.62 | 22.06 | 41.95 | 26.53 | 1.63 | 25.22 | 13.90 | 64.68 | 32.42 |
monsoon-whisper-medium-gigaspeech2 | 11.66 | 20.50 | 41.04 | 42.06 | 7.57 | 21.40 | 21.54 | 51.65 | 38.89 |
pathumma-whisper-th-large-v3 | 8.68 | 9.84 | 15.47 | 19.85 | 1.53 | 21.66 | 15.65 | 51.56 | 36.47 |
Limitations
More information needed
Acknowledgements
We extend our appreciation to the research teams engaged in the creation of the open speech model, including AIResearch, BiodatLab, Looloo Technology, SCB 10X, and OpenAI. We would like to express our gratitude to Dr. Titipat Achakulwisut of BiodatLab for the evaluation pipeline. We express our gratitude to ThaiSC, or NSTDA Supercomputer Centre, for supplying the LANTA used for model training, fine-tuning, and evaluation.
Pathumma Audio Team
Pattara Tipkasorn, Wayupuk Sommuang, Oatsada Chatthong, Kwanchiva Thangthai
Citation
More information needed