metadata

license: apache-2.0
language:
  - th
  - en
base_model:
  - openai/whisper-large-v3
pipeline_tag: automatic-speech-recognition
library_name: transformers
metrics:
  - wer

Pathumma Whisper Large V3 (TH)

More information needed

Model Description

More information needed

Quickstart

More information needed

Evaluation Performance

Model	WER (CV18)	WER (Gowejee)	WER (LOTUS-TRD)	WER (Thai Dialect)	WER (Elderly)	WER (Gigaspeech2)	WER (Fleurs)	WER (Distant Meeting)	WER (Podcast)
whisper-large-v3	18.75	46.59	48.14	57.82	12.27	33.26	24.08	72.57	41.24
airesearch-wav2vec2-large-xlsr-53-th	8.49	17.28	63.01	48.53	11.29	52.72	37.32	85.11	65.12
thonburian-whisper-th-large-v3-combined	7.62	22.06	41.95	26.53	1.63	25.22	13.90	64.68	32.42
monsoon-whisper-medium-gigaspeech2	11.66	20.50	41.04	42.06	7.57	21.40	21.54	51.65	38.89
pathumma-whisper-th-large-v3	8.68	9.84	15.47	19.85	1.53	21.66	15.65	51.56	36.47

Limitations

More information needed

Acknowledgements

We extend our appreciation to the research teams engaged in the creation of the open speech model, including AIResearch, BiodatLab, Looloo Technology, SCB 10X, and OpenAI. We would like to express our gratitude to Dr. Titipat Achakulwisut of BiodatLab for the evaluation pipeline. We express our gratitude to ThaiSC, or NSTDA Supercomputer Centre, for supplying the LANTA used for model training, fine-tuning, and evaluation.

Pathumma Audio Team

Pattara Tipkasorn, Wayupuk Sommuang, Oatsada Chatthong, Kwanchiva Thangthai

Citation

More information needed