ziyue
/

open-asqa-speech

Model card Files Files and versions Community

open-asqa-speech / README.md

ziyue's picture

Update README.md

59be7ad verified 3 months ago

|

history blame contribute delete

1.56 kB

	# Open-ASQA-Speech for R1-A

	Now support for:
	- MOSEI
	- LibriTTS
	- IMOCAP

	## Dataset Usage

	### MOSEI
	You can assess the data with `datasets/affect/get_data.py` from `https://github.com/pliang279/MultiBench`, which will return [vision, audio, text, ind, label].
	``` python
	# Example code
	traindata, validdata, test_robust = get_dataloader('./mosei_raw.pkl', data_type='mosei')
	```

	### LibriTTS
	LibriTTS is a multi-speaker English corpus of approximately 585 hours of read English speech at 24kHz sampling rate.

	There are 7 splits (dots replace dashes from the original dataset, to comply with hf naming requirements):
	- dev.clean dev.other
	- test.clean test.other
	- train.clean.100 train.clean.360 train.other.500

	Configurations
	The default configuration is "all".
	- "dev": only the "dev.clean" split (good for testing the dataset quickly)
	- "clean": contains only "clean" splits
	- "other": contains only "other" splits
	- "all": contains only "all" splits

	``` python
	# Example code
	load_dataset("{your path}/libritts", "clean", split="train.clean.100")
	```

	### IMOCAP
	The Interactive Emotional Dyadic Motion Capture (IEMOCAP) database is an acted, multimodal and multispeaker database, recently collected at SAIL lab at USC. It contains approximately 12 hours of audiovisual data, including video, speech, motion capture of face, text transcriptions. IEMOCAP database is annotated by multiple annotators into categorical labels, such as anger, happiness, sadness, neutrality, as well as dimensional labels such as valence, activation and dominance.