MFCC Audio processing for speech recognition

Sub forums for various specialist XMOS applications. e.g. USB audio, motor control and robotics.
Post Reply
User avatar
Folknology
XCore Legend
Posts: 1274
Joined: Thu Dec 10, 2009 10:20 pm
Contact:

MFCC Audio processing for speech recognition

Post by Folknology »

Is Xmos or any others here working on MFCC (Mel Frequency Cepstral Coefficient) audio pre processing as I need to build implementations of keyword/speech recognition. Initially this will be an effort using both Xmos and PC processing. I will be interested in finding out how much of this can be done on the Xmos side based what number of cores have to be employed to be useful.

Would love to hear about anyone's experience, feedback or ideas in this area

regards
Al


User avatar
mon2
XCore Legend
Posts: 1913
Joined: Thu Jun 10, 2010 11:43 am
Contact:

Post by mon2 »

Hi Al. We have been reviewing the following working solution for some pending designs which would work with the XMOS using I2C / SPI (cannot recall the exact details at this time):

https://www.spansion.com/Products/micro ... voice.aspx

The following low cost kit is an excellent working solution to immediately jump into this field:

http://www.digikey.com/product-search/e ... 34-1086-ND

The device is from Spansion which is now a Cypress company and is based on a Cortex M4 processor core. The solution works well and features some demos including a voice activated, speaker independent TV remote control. The evalboard waits for a trigger word / phrase and then proceeds to understand the spoken words / phrases by echoing the recognized speech to the local terminal.

What we have not done yet is to expand the demo which can be done by supplying a text file that features the phonetic breakdown of the words you wish to recognize. That is, training is NOT by speaking into a microphone for hours but rather the appended text file with your 'speech dictionary'.

The CPU works in 2 modes and we are considering to use with the serial interface to allow for a stand alone solution for the voice recognition. Where the XMOS would assist is the beam forming of the microphone / noise cancellation but at this time, we openly do not wish to invest the list price for the XMOS microphone kit so exploring our options. We wish to offer a competing device to the Amazon Echo but without the reported $100M-$200M R&D budget :)

PS: The low cost kit comes with a small microphone which works very well 'out of the box' but we wish to further enhance the solution with our special sauce.

Hope this helps.

Kumar
User avatar
Folknology
XCore Legend
Posts: 1274
Joined: Thu Dec 10, 2009 10:20 pm
Contact:

Post by Folknology »

mon2 wrote:Hi Al. We have been reviewing the following working solution for some pending designs which would work with the XMOS using I2C / SPI (cannot recall the exact details at this time):

https://www.spansion.com/Products/micro ... voice.aspx

The following low cost kit is an excellent working solution to immediately jump into this field:

http://www.digikey.com/product-search/e ... 34-1086-ND
Those look pretty good value, I need to look more at the technical detail.
mon2 wrote: What we have not done yet is to expand the demo which can be done by supplying a text file that features the phonetic breakdown of the words you wish to recognize. That is, training is NOT by speaking into a microphone for hours but rather the appended text file with your 'speech dictionary'.
That is very interesting and would prob do for keywords, however more complex commands would require more external processes I guess.
mon2 wrote: The CPU works in 2 modes and we are considering to use with the serial interface to allow for a stand alone solution for the voice recognition. Where the XMOS would assist is the beam forming of the microphone / noise cancellation but at this time, we openly do not wish to invest the list price for the XMOS microphone kit so exploring our options. We wish to offer a competing device to the Amazon Echo but without the reported $100M-$200M R&D budget :)

PS: The low cost kit comes with a small microphone which works very well 'out of the box' but we wish to further enhance the solution with our special sauce.

Hope this helps.

Kumar
Yeah the Xmos Multi mike kit is really expensive unfortunately, not quite sure why that is..

Thanks for the feedback Kumar, I am going to look deeper at those ideas/products.

regards
Al
User avatar
mon2
XCore Legend
Posts: 1913
Joined: Thu Jun 10, 2010 11:43 am
Contact:

Post by mon2 »

Al, forgot to note that the accuracy of the Spansion solution (Cypress) is very very high and can state comfortably > 95%+. We recall talking with phrases like "TV volume up / down" and the unit would accurately recognize the spoken phrases. Be sure to get the latest firmware for the evalkit. Spansion support is off the charts good (perhaps we were an early adopter when this unit was introduced) and can clearly tell they have invested heavily into this technology for the auto market, toys, etc.

If XMOS would make their solution more affordable, we would consider to review their microphone board kit as well...wink wink.

Also, if you plan to be tethered to the internet for your product, consider to interface with the Google voice recognition support already built into Google Chrome. That is also an amazing piece of technology. Run Google Chrome with a microphone attached and enable the microphone on the right side of the search bar and then talk away - eerie how far this technology has matured. For our product line, we need a fall back plan in case the internet is 'down'.

If you ping me - we can share the contacts we have for the Spansion support if that is of interest.
User avatar
Folknology
XCore Legend
Posts: 1274
Joined: Thu Dec 10, 2009 10:20 pm
Contact:

Post by Folknology »

Just ran into this matrix board which is also relevant here https://creator.matrix.one/#/index
Post Reply