Speech to Text
-
Description
In many applications, we need to convert what a user says into text. This is called “speech recognition” or “voice recognition”. Note that this is still far away from understanding what the user is saying, but it might be good enough in many use cases.
For example, we can allow a user to say commands like “go” and “stop” to drive a car by voice.
Key Blocks
There are 3 key blocks that work together for speech recognition:
Start Speech Recognition
To start, we need to run this block:
-
Language: the first input is the language the user will speak in. Note that even for the same language, such as English, there are more than one accents you can choose from, such as “English (United States)” vs “English (United Kingdom)”.
-
Sound Name: the second input is optional. If it is not empty, then the user’s speech will be saved as a sound under the “Sounds” tab. This will be handy if you need to play back what the user actually said.
When you run this block, the playground will try to get your permission to listen to you through the computer microphone. After getting the permission, it will show a red microphone icon on the stage to indicate it is still listening to you. It will keep listening until you stop it.
End Speech Recognition
To end the speech recognition, you can use this block:
After running this block, the program will stop listening to the microphone.
Text from Speech
The recognized text will be stored in this reporter block:
Clear Speech Text
After the user starts the speech recognition, the AI engine will keep updating the value of “text from speech”, so it will grow longer as the user speaks. If we want to clear the content of the “text from speech”, we can use this block:
After running this block, the value of “text from speech” will reset to empty, and it will start to grow again as the user speaks more. In this example below, the user keeps on saying “step 1, step 2…”, until “step 7”. The value of “text from speech” keeps being updated, but when we run “clear speech text”, it becomes empty again:
Example 1
In this example, we will listen to the user for 2 seconds, then make the dog say the recognized text:
Example 2
We do not need to wait until we end the speech recognition to check what the user is saying. Whenever the user completes a sentence, we will get an updated version of the recognized text. In this example, we put a forever loop that keeps on checking the recognized text. Whenever the user completes a new sentence, the recognized text is updated. If the user says “Stop”, the program will end the speech recognition.
-