ziyue commited on
Commit
d15450b
·
verified ·
1 Parent(s): 50dc7f6

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +35 -0
README.md ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Open-ASQA-Speech for R1-A
2
+
3
+ Now support for:
4
+ - LibriTTS
5
+ - MOSEI
6
+
7
+ ## Dataset Usage
8
+
9
+ ### MOSEI
10
+ You can assess the data with `datasets/affect/get_data.py` from `https://github.com/pliang279/MultiBench`, which will return [vision, audio, text, ind, label].
11
+ ``` python
12
+ # Example code
13
+ traindata, validdata, test_robust = get_dataloader('./mosei_raw.pkl', data_type='mosei')
14
+ ```
15
+
16
+ ### LibriTTS
17
+ LibriTTS is a multi-speaker English corpus of approximately 585 hours of read English speech at 24kHz sampling rate.
18
+
19
+ There are 7 splits (dots replace dashes from the original dataset, to comply with hf naming requirements):
20
+ - dev.clean dev.other
21
+ - test.clean test.other
22
+ - train.clean.100 train.clean.360 train.other.500
23
+
24
+ ** Configurations **
25
+ The default configuration is "all".
26
+ - "dev": only the "dev.clean" split (good for testing the dataset quickly)
27
+ - "clean": contains only "clean" splits
28
+ - "other": contains only "other" splits
29
+ - "all": contains only "all" splits
30
+
31
+ ``` python
32
+ # Example code
33
+ load_dataset("blabble-io/libritts", "clean", split="train.clean.100")
34
+ ```
35
+