Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,35 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Open-ASQA-Speech for R1-A
|
2 |
+
|
3 |
+
Now support for:
|
4 |
+
- LibriTTS
|
5 |
+
- MOSEI
|
6 |
+
|
7 |
+
## Dataset Usage
|
8 |
+
|
9 |
+
### MOSEI
|
10 |
+
You can assess the data with `datasets/affect/get_data.py` from `https://github.com/pliang279/MultiBench`, which will return [vision, audio, text, ind, label].
|
11 |
+
``` python
|
12 |
+
# Example code
|
13 |
+
traindata, validdata, test_robust = get_dataloader('./mosei_raw.pkl', data_type='mosei')
|
14 |
+
```
|
15 |
+
|
16 |
+
### LibriTTS
|
17 |
+
LibriTTS is a multi-speaker English corpus of approximately 585 hours of read English speech at 24kHz sampling rate.
|
18 |
+
|
19 |
+
There are 7 splits (dots replace dashes from the original dataset, to comply with hf naming requirements):
|
20 |
+
- dev.clean dev.other
|
21 |
+
- test.clean test.other
|
22 |
+
- train.clean.100 train.clean.360 train.other.500
|
23 |
+
|
24 |
+
** Configurations **
|
25 |
+
The default configuration is "all".
|
26 |
+
- "dev": only the "dev.clean" split (good for testing the dataset quickly)
|
27 |
+
- "clean": contains only "clean" splits
|
28 |
+
- "other": contains only "other" splits
|
29 |
+
- "all": contains only "all" splits
|
30 |
+
|
31 |
+
``` python
|
32 |
+
# Example code
|
33 |
+
load_dataset("blabble-io/libritts", "clean", split="train.clean.100")
|
34 |
+
```
|
35 |
+
|