simon: Open Source Speech Recognition

Speaker: Peter Grasch

simon is a speech recognition solution based on Julius and the HTK.

simon is designed to be as flexible as possible and will work with any language or dialect. The reactions to recognition results are completely configurable as well and there is not a single voice command that can't be configured to the users needs.

Voice commands are managed through a flexible plugin architecture which makes the system easily extendable. The current simon version already includes plugins for launching applications, opening URLs, typing pre-configured text snippets, activating shortcuts as well as a grid interface to control the mouse, a voice controlled calculator, a voice controlled virtual keyboard and more.

A demonstration of simon 0.2 can be found on youtube (http://www.youtube.com/watch?v=x_9ImaiOISs).

Despite all this flexibility, we tried to keep the initial learning curve as shallow as possible without removing any of the much needed flexibility.

This is why current versions of simon also include what we call "scenarios". Scenarios are "packages" of simon configurations. One scenario covers one use case of the system. Possible simon scenarios are for example "Firefox" (launching and controlling firefox) or "window management" (closing / moving / resizing windows). Scenarios can easily be created by users and shared with the community.

simon also supports to use generic, general models like the GPL models from Voxforge so that users who speak e.g. "standard English" don't need to train the model at all.

Speech recognition is a monumental task and most open source projects fail because of insufficient manpower. With simon we try a different approach that has one main benefit: The really time consuming (expensive) tasks like recording thousands of hours of speech and generating application specific configurations can be easily done by the community themselves and result naturally through the normal workflow of the system. This "crowdsurfing" might just make speech recognition using open source principles not only possible but much more effective than using expensive professional speakers essentially locking users in pre-defined speech patterns.

As a KDE project we would be proud to present simon at this years Akademy. The talk will include technical background on how speech recognition - especially the implementation in simon - works, show how users can benefit from simon and also how developers can get involved. There will also be a live demonstration of the current simon version.

Peter Grasch

My name is Peter Grasch and I am the main developer behind the software simon and vice chairmen of the friendly society simon listens. I am currently studying computer science at the Graz University of Technology.