Open-ASQA-Speech for R1-A
Now support for:
- MOSEI
- LibriTTS
- IMOCAP
Dataset Usage
MOSEI
You can assess the data with datasets/affect/get_data.py
from https://github.com/pliang279/MultiBench
, which will return [vision, audio, text, ind, label].
# Example code
traindata, validdata, test_robust = get_dataloader('./mosei_raw.pkl', data_type='mosei')
LibriTTS
LibriTTS is a multi-speaker English corpus of approximately 585 hours of read English speech at 24kHz sampling rate.
There are 7 splits (dots replace dashes from the original dataset, to comply with hf naming requirements):
- dev.clean dev.other
- test.clean test.other
- train.clean.100 train.clean.360 train.other.500
** Configurations ** The default configuration is "all".
- "dev": only the "dev.clean" split (good for testing the dataset quickly)
- "clean": contains only "clean" splits
- "other": contains only "other" splits
- "all": contains only "all" splits
# Example code
load_dataset("{your path}/libritts", "clean", split="train.clean.100")
IMOCAP
The Interactive Emotional Dyadic Motion Capture (IEMOCAP) database is an acted, multimodal and multispeaker database, recently collected at SAIL lab at USC. It contains approximately 12 hours of audiovisual data, including video, speech, motion capture of face, text transcriptions. IEMOCAP database is annotated by multiple annotators into categorical labels, such as anger, happiness, sadness, neutrality, as well as dimensional labels such as valence, activation and dominance.