From ChatGPT Prompts to Audio Classification: A Creative Journey in Sound

In the realm of artificial intelligence, where machines mimic human intelligence, audio models hold a unique place. They listen, analyze, and interpret sounds, unlocking a world of possibilities from speech recognition to music generation. But training these models requires a diverse dataset, and that’s where Shakhizat Nurgaliyev’s project shines.

Shaping Sounds with Generative Models: A New Approach

Traditionally, creating a dataset for audio models involves meticulous microphone setup, capturing lengthy audio sequences, and painstaking manual data cleaning. Nurgaliyev’s project takes a novel approach, eliminating the need for this laborious process. Instead, he harnesses the power of generative models, which create synthetic data from scratch.

ChatGPT: The Master of Audio Prompts

Nurgaliyev turned to ChatGPT, the remarkable language model, to craft a series of audio descriptions. With a simple prompt, he prompted ChatGPT to generate 300 detailed descriptions, covering three distinct audio classes: speech, music, and background noise. These descriptions became the foundation for the synthetic dataset.

NVIDIA and Meta Join Forces: Generating Sounds from Descriptions

To transform these descriptions into actual sounds, Nurgaliyev employed the NVIDIA Jetson AGX Orin Developer Kit and Meta’s AudioCraft model. This powerful combination enabled the generation of sound snippets that accurately reflected the audio descriptions provided by ChatGPT.

Edge Impulse: A Platform for Audio Classification

With the synthetic dataset in hand, Nurgaliyev turned to Edge Impulse, a platform dedicated to developing machine learning models for embedded devices. He created an audio classification project, uploaded the generated samples, and designed an Impulse using the MFE audio block and a Keras classifier model.

iRobota: The Gateway to Audio Classification in the Real World

To bring the audio classification model into the physical world, Nurgaliyev built an iRobota library and loaded it onto an iRobota GIGA R1 WiFi board. A simple sketch was written to continuously listen for audio data, perform classification, and display the label on the GIGA R1’s Display Shield screen.

Hackster.io: Sharing the Journey

Nurgaliyev’s project is meticulously documented on Hackster.io, a vibrant community for hardware enthusiasts. The write-up provides comprehensive details, allowing others to replicate and extend his work.

Bonus: The Art of Audio Manipulation

Nurgaliyev’s project opens up exciting possibilities for audio manipulation. Generative models can create unique and diverse soundscapes, blurring the line between reality and imagination. From creating personalized soundtracks to designing immersive soundscapes for virtual reality, the potential is limitless.

Nurgaliyev’s project is a testament to the power of creativity and innovation in the field of artificial intelligence. By combining ChatGPT’s language generation capabilities with generative audio models, he has crafted a novel approach to audio dataset creation. This opens up new avenues for research and development in the realm of audio classification and beyond.