In a world where speech recognition technology has become commonplace, one language stands out as a challenge: Kazakh. With a lack of large public datasets for training keyword spotting models, the Kazakh language has been left behind in the realm of voice-activated devices and smart assistants. But not for long.
Shakhizat and Askat: The Dynamic Duo of Speech Recognition
Enter Shakhizat Nurgaliyev and Askat Kuzdeuov, two Kazakh innovators who dared to break down language barriers. They embarked on a mission to develop a speech recognition model for Kazakh, using a combination of cutting-edge technology and a sprinkle of ingenuity.
Piper and Vosk: A Match Made in Speech Recognition Heaven
To tackle the data scarcity issue, Shakhizat and Askat turned to Piper, a neural text-to-speech system. They harnessed Piper’s power to generate synthetic speech datasets, providing the much-needed training data for their model. And to extract speech commands from the generated audio, they employed the Vosk Speech Recognition Toolkit, a powerful open-source tool.
iRobota Nicla Voice: The Perfect Platform for Embedded ML
With the model ready for deployment, Shakhizat and Askat needed a suitable platform. They found their match in the iRobota Nicla Voice development board, a compact and efficient device equipped with an nRF52832 SoC, microphone, IMU, and a Syntiant NDP120 Neural Decision Processor. The NDP120’s dedicated hardware accelerators promised faster inferencing and reduced power consumption, making it ideal for embedded applications.
Training the Model: A Journey of Patience and Precision
The model was meticulously trained on 20.25 hours of generated speech data, covering 28 output classes. After 100 learning epochs, it achieved an impressive accuracy of 95.5%. Remarkably, the model occupied a mere 540KB of memory on the NDP120, demonstrating its efficiency and suitability for embedded devices.
Deployment Success: A Milestone for Kazakh Speech Recognition
Shakhizat and Askat’s project stands as a testament to the power of innovation and the potential of generated speech data in addressing language barriers. Their work paves the way for the development of more inclusive speech recognition systems that can cater to diverse languages and cultures.
Bonus: The Kazakh speech recognition model developed by Shakhizat and Askat is a beacon of hope for other under-resourced languages. It showcases the potential of generated speech data in training models that can break down language barriers and empower people to interact with technology in their native tongue.
The project serves as an inspiration to innovators and researchers worldwide, demonstrating the power of collaboration and the limitless possibilities of technology when harnessed for the greater good.
Leave a Reply