|
# Open-ASQA-Speech for R1-A |
|
|
|
Now support for: |
|
- MOSEI |
|
- LibriTTS |
|
- IMOCAP |
|
|
|
## Dataset Usage |
|
|
|
### MOSEI |
|
You can assess the data with `datasets/affect/get_data.py` from `https://github.com/pliang279/MultiBench`, which will return [vision, audio, text, ind, label]. |
|
``` python |
|
# Example code |
|
traindata, validdata, test_robust = get_dataloader('./mosei_raw.pkl', data_type='mosei') |
|
``` |
|
|
|
### LibriTTS |
|
LibriTTS is a multi-speaker English corpus of approximately 585 hours of read English speech at 24kHz sampling rate. |
|
|
|
There are 7 splits (dots replace dashes from the original dataset, to comply with hf naming requirements): |
|
- dev.clean dev.other |
|
- test.clean test.other |
|
- train.clean.100 train.clean.360 train.other.500 |
|
|
|
** Configurations ** |
|
The default configuration is "all". |
|
- "dev": only the "dev.clean" split (good for testing the dataset quickly) |
|
- "clean": contains only "clean" splits |
|
- "other": contains only "other" splits |
|
- "all": contains only "all" splits |
|
|
|
``` python |
|
# Example code |
|
load_dataset("{your path}/libritts", "clean", split="train.clean.100") |
|
``` |
|
|
|
### IMOCAP |
|
The Interactive Emotional Dyadic Motion Capture (IEMOCAP) database is an acted, multimodal and multispeaker database, recently collected at SAIL lab at USC. It contains approximately 12 hours of audiovisual data, including video, speech, motion capture of face, text transcriptions. IEMOCAP database is annotated by multiple annotators into categorical labels, such as anger, happiness, sadness, neutrality, as well as dimensional labels such as valence, activation and dominance. |
|
|
|
|