Open-ASQA-Speech for R1-A

Now support for:

MOSEI
LibriTTS
IMOCAP

Dataset Usage

MOSEI

You can assess the data with datasets/affect/get_data.py from https://github.com/pliang279/MultiBench, which will return [vision, audio, text, ind, label].

# Example code
traindata, validdata, test_robust = get_dataloader('./mosei_raw.pkl', data_type='mosei')

LibriTTS

LibriTTS is a multi-speaker English corpus of approximately 585 hours of read English speech at 24kHz sampling rate.

There are 7 splits (dots replace dashes from the original dataset, to comply with hf naming requirements):

dev.clean dev.other
test.clean test.other
train.clean.100 train.clean.360 train.other.500

** Configurations ** The default configuration is "all".

"dev": only the "dev.clean" split (good for testing the dataset quickly)
"clean": contains only "clean" splits
"other": contains only "other" splits
"all": contains only "all" splits

# Example code
load_dataset("{your path}/libritts", "clean", split="train.clean.100")

IMOCAP

The Interactive Emotional Dyadic Motion Capture (IEMOCAP) database is an acted, multimodal and multispeaker database, recently collected at SAIL lab at USC. It contains approximately 12 hours of audiovisual data, including video, speech, motion capture of face, text transcriptions. IEMOCAP database is annotated by multiple annotators into categorical labels, such as anger, happiness, sadness, neutrality, as well as dimensional labels such as valence, activation and dominance.