metadata

title: Ass1
emoji: 🏢
colorFrom: red
colorTo: purple
sdk: gradio
sdk_version: 3.0.17
app_file: app.py
pinned: false

EE 298 DL Assignment 1 (2S2021-22) by Paul Darvin

Demo Application for Sound Event Detection in Huggingface Space

Link to Original/Reference Code

The codes contained in this repository were derived only from PANNs inference Github repository which is an extension of the mother repository for the paper PANNs: Large-Scale Pretrained Audio Neural Networks for Audio Pattern Recognition.

Background

An sound event detection system is an audio tagging system applied to time segments of an audio signal. It identifies tags like the presence of an object, a living thing, and an action that generates sound in a particular time.

Significance

Applications of sound event detection system are wide-ranging. For instance, a deaf person can use such system to detect an approaching vehicle or watch a movie with sounds described to him/her/them. It can aid in forensics for identifying presence of objects and actions in an audio evidence. It can also be used to navigate through a large audio file using time-indexed tags. Robots can be made more "human" by giving the ability to interpret audio signals the way humans do.

Model Description

CNN14 is 14-layer convolutional neural network with 6 convolution layers. It uses a log-mel spectrogram with 1000 frames and 64 mel bins at the topmost layer to translate audio data to image data. The details of the architecture can be found in the paper.
The authors claimed to achieve mean average precision (mAP) of 0.431 for CNN14 which exceeded the best system's mAP (0.392) at the time of publication.

Usage

Upload an audio file in WAV format. Other formats are not yet supported.