AiLab-IMCS-UL
/

whisper-large-v3-latgalian-2503

Automatic Speech Recognition

Model card Files Files and versions Community

normundsg commited on Mar 24

Commit

0426880

·

verified ·

1 Parent(s): 5892446

Update README.md

Files changed (1) hide show

README.md +25 -1

README.md CHANGED Viewed

@@ -3,4 +3,28 @@ license: apache-2.0
 base_model:
 - AiLab-IMCS-UL/whisper-large-v3-lv-late-cv19
 pipeline_tag: automatic-speech-recognition
----

 base_model:
 - AiLab-IMCS-UL/whisper-large-v3-lv-late-cv19
 pipeline_tag: automatic-speech-recognition
+---
+# General-purpose Latgalian ASR model
+This is a fine-tuned [whisper-large-v3](https://huggingface.co/openai/whisper-large-v3) model for [Latgalian](https://en.wikipedia.org/wiki/Latgalian_language), trained by [AiLab.lv](https://ailab.lv) using two general-purpose speech datasets:
+- the Latgalian part of [Common Voice 20.0](https://commonvoice.mozilla.org/ltg/datasets),
+- the Corpus of Contemporary Latgalian Speech [MuLaR](https://korpuss.lv/id/MuLaR).
+## Training
+As a base model, we used a previously fine-tuned ASR model for [Latvian](https://huggingface.co/AiLab-IMCS-UL/whisper-large-v3-lv-late-cv19), and continued to fine-tune it for Latgalian. The fine-tuning was done using the Hugging Face Transformers library.
+| Training data | Hours |
+|:---|---:|
+| Latgalian Common Voice 20.0 train set (a [VW split](https://analyzer.cv-toolbox.web.tr)) | 22.9 |
+| Corpus of Contemporary Latgalian Speech (MuLaR) train set | 17.3 |
+| Total | 40.2 |
+## Evaluation
+TBA
+## Acknowledgements
+This work was supported by the EU Recovery and Resilience Facility project [Language Technology Initiative](https://www.vti.lu.lv) (2.3.1.1.i.0/1/22/I/CFLA/002) in synergy with the State Research Programme project [LATE](https://www.digitalhumanities.lv/projekti/vpp-late/) (VPP-LETONIKA-2021/1-0006).