Kimang18 commited on
Commit
af5371e
·
verified ·
1 Parent(s): 9441809

Upload folder using huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +37 -15
README.md CHANGED
@@ -1,28 +1,51 @@
1
  ---
2
  library_name: mlx
3
  license: apache-2.0
 
 
 
4
  datasets:
5
- - google/fleurs
6
- - seanghay/khmer_mpwt_speech
7
  - seanghay/km-speech-corpus
8
- - openslr/openslr
9
- metrics:
10
- - wer
11
  tags:
12
- - mlx
13
  - Khmer
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
14
  ---
15
 
16
  # whisper-tiny-khmer-mlx-fp32
17
- This model was converted to MLX format from [`openai-whisper-tiny`](https://github.com/openai/whisper), then fine tuned to Khmer language using two datasets:
18
- - [seanghay/khmer_mpwt_speech](https://huggingface.co/datasets/seanghay/khmer_mpwt_speech)
19
- - [seanghay/km-speech-corpus](https://huggingface.co/datasets/seanghay/km-speech-corpus)
20
 
21
  It achieves the following __word error rate__ (`wer`) on 2 popular datasets:
22
- - 0.938 on [google/fleurs](https://huggingface.co/datasets/google/fleurs) `km-kh`, `test` split
23
- - 0.697 on [openslr/openslr](https://huggingface.co/datasets/openslr/openslr) `SLR42`, `train` split
24
 
25
- __NOTE__ MLX format is usable for M-chip series of Apple.
26
 
27
  ## Use with mlx
28
  ```bash
@@ -38,12 +61,11 @@ result = mlx_whisper.transcribe(
38
  path_or_hf_repo="Kimang18/whisper-tiny-khmer-mlx-fp32",
39
  fp16=False
40
  )
41
- print(result['text']) # print khmer text in SPEECH_FILE_NAME
42
  ```
43
  Then execute this script `example.py` to see the result.
44
 
45
  You can also use command line in terminal
46
-
47
  ```bash
48
  mlx_whisper --model Kimang18/whisper-tiny-khmer-mlx-fp32 --task transcribe SPEECH_FILE_NAME --fp16 False
49
- ```
 
1
  ---
2
  library_name: mlx
3
  license: apache-2.0
4
+ language:
5
+ - kh
6
+ pipeline_tag: automatic-speech-recognition
7
  datasets:
 
 
8
  - seanghay/km-speech-corpus
9
+ - seanghay/khmer_mwpt_speech
 
 
10
  tags:
 
11
  - Khmer
12
+ - mlx
13
+ base_model: openai-whisper-tiny
14
+ model-index:
15
+ - name: whisper-tiny-khmer-mlx-fp32 by Kimang KHUN
16
+ results:
17
+ - task:
18
+ type: automatic-speech-recognition
19
+ name: Speech Recognition
20
+ dataset:
21
+ name: test split of "km_kh" in google/fleurs
22
+ type: google/fleurs
23
+ metrics:
24
+ - type: wer
25
+ value: 93.8%
26
+ name: test
27
+ - task:
28
+ type: automatic-speech-recognition
29
+ name: Speech Recognition
30
+ dataset:
31
+ name: train split of "SLR42" in openslr/openslr
32
+ type: openslr/openslr
33
+ metrics:
34
+ - type: wer
35
+ value: 69.7%
36
+ name: test
37
  ---
38
 
39
  # whisper-tiny-khmer-mlx-fp32
40
+ This model was converted to MLX format from [`openai-whisper-tiny`](https://github.com/openai/whisper), then fine-tined to Khmer language using two datasets:
41
+ - [seanghay/khmer_mpwt_speech](https://huggingface.com/datasets/seanghay/khmer_mpwt_speech)
42
+ - [seanghay/km-speech-corpus](https://huggingface.com/datasets/seanghay/km-speech-corpus)
43
 
44
  It achieves the following __word error rate__ (`wer`) on 2 popular datasets:
45
+ - ??? on `test` split of [google/fleurs](https://huggingface.co/datasets/google/fleurs) `km-kh`
46
+ - ??? on `train` split of [openslr/openslr](https://huggingface.co/datasets/openslr/openslr) `SLR42`
47
 
48
+ __NOTE__ MLX format is usable for M-chip series of Apple.
49
 
50
  ## Use with mlx
51
  ```bash
 
61
  path_or_hf_repo="Kimang18/whisper-tiny-khmer-mlx-fp32",
62
  fp16=False
63
  )
64
+ print(result['text'])
65
  ```
66
  Then execute this script `example.py` to see the result.
67
 
68
  You can also use command line in terminal
 
69
  ```bash
70
  mlx_whisper --model Kimang18/whisper-tiny-khmer-mlx-fp32 --task transcribe SPEECH_FILE_NAME --fp16 False
71
+ ```