Spaces:
Sleeping
A newer version of the Gradio SDK is available:
5.33.1
title: Ass1
emoji: 🏢
colorFrom: red
colorTo: purple
sdk: gradio
sdk_version: 3.0.17
app_file: app.py
pinned: false
EE 298 DL Assignment 1 (2S2021-22) by Paul Darvin
Demo Application for Sound Event Detection in Huggingface Space
Link to Original/Reference Code
The codes contained in this repository were derived only from PANNs inference Github repository which is an extension of the mother repository for the paper PANNs: Large-Scale Pretrained Audio Neural Networks for Audio Pattern Recognition.
Background
An sound event detection system is an audio tagging system applied to time segments of an audio signal. It identifies tags like the presence of an object, a living thing, and an action that generates sound in a particular time.
Significance
Applications of sound event detection system are wide-ranging. For instance, a deaf person can use such system to detect an approaching vehicle or watch a movie with sounds described to him/her/them. It can aid in forensics for identifying presence of objects and actions in an audio evidence. It can also be used to navigate through a large audio file using time-indexed tags. Robots can be made more "human" by giving the ability to interpret audio signals the way humans do.
Model Description
CNN14 is 14-layer convolutional neural network with 6 convolution layers. It uses a log-mel spectrogram with 1000 frames and 64 mel bins at the topmost layer to translate audio data to image data. The details of the architecture can be found in the paper.
The authors claimed to achieve mean average precision (mAP) of 0.431 for CNN14 which exceeded the best system's mAP (0.392) at the time of publication.
Usage
Upload an audio file in WAV format. Other formats are not yet supported.