<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[An assistant that can see and talk (Difficulty: 3)]]></title><description><![CDATA[<h2><a class="anchor-offset" name="introduction"></a>Introduction</h2>
<p dir="auto"> </p>
<p dir="auto">Large language models today are usually <strong>multi-modal</strong>. This means they can not only <strong>chat using words</strong> but also <strong>understand images</strong>.<br />
This is incredibly useful — sometimes, it’s much easier to show a picture than to try to describe something with words!</p>
<p dir="auto">In this tutorial, you will create an AI assistant that can <strong>see</strong> and <strong>talk</strong>.<br />
The user just needs to <strong>take a picture</strong> with the camera and then <strong>ask the AI any question about that picture</strong>!</p>
<p dir="auto"><img src="https://cdncreaticodecom.b-cdn.net/scratch-gui-projects/forum/0038f976-438e-49f7-b8f1-672df813fcf8.png" alt="448b6d33-2ead-4e45-8773-cc0881336bd5-image.png" class=" img-responsive img-markdown" width="484" height="365" /></p>
<p dir="auto"> <br />
 </p>
<h2><a class="anchor-offset" name="step-1-create-a-new-project"></a>Step 1 - Create a new project</h2>
<p dir="auto"> </p>
<p dir="auto">On <a href="http://CreatiCode.com" target="_blank" rel="noopener noreferrer nofollow ugc">CreatiCode.com</a>, log in and create a new project. Remove the dog sprite, and rename the project to “AI Assistant”.</p>
<p dir="auto"> <br />
 </p>
<h2><a class="anchor-offset" name="step-2-search-for-or-generate-a-backdrop"></a>Step 2 - Search for or generate a backdrop</h2>
<p dir="auto"> </p>
<p dir="auto">When the project starts, we want a cool backdrop to show that this is an AI assistant.</p>
<ul>
<li>Switch to the Stage and add a new backdrop.</li>
<li>Search for something like “a helpful AI assistant” to find interesting designs:<br />
 <br />
<img src="https://cdncreaticodecom.b-cdn.net/scratch-gui-projects/forum/685df1ff-5d1f-4ef6-a531-7f74eb053cab.png" alt="b2b538a9-48e3-480c-b4f4-e8e4db00b7d7-image.png" class=" img-responsive img-markdown" width="1562" height="707" /></li>
</ul>
<p dir="auto"> <br />
You can also generate a new backdrop based on your own idea. For example, suppose we want this assistant to be used as a tour guide, then we can generate the backdrop with a <strong>detailed</strong> description like this:</p>
<pre><code class="language-text">a robot tour guide facing the viewer with both arms open, standing in front of a historical site, cartoon style
</code></pre>
<p dir="auto">And you might get a result like this:</p>
<p dir="auto"><img src="https://ccdncreaticodecom.b-cdn.net/newimages/bg1/robot_tour_guide_cartoon_30352206_663656.webp" alt="alt text" class=" img-responsive img-markdown" width="960" height="720" /></p>
<p dir="auto"> <br />
 </p>
<h2><a class="anchor-offset" name="step-3-show-2-buttons-when-the-green-flag-is-clicked"></a>Step 3 - Show 2 buttons when the green flag is clicked</h2>
<p dir="auto"> </p>
<p dir="auto">Now switch to the empty sprite.</p>
<p dir="auto">We’ll <strong>add two buttons</strong> so the user can pick a camera:</p>
<ul>
<li><strong>Front camera</strong> (for laptops/touchpads/Chromebooks)</li>
<li><strong>Back camera</strong> (for iPads/phones)<br />
 </li>
</ul>
<p dir="auto"><img src="https://cdncreaticodecom.b-cdn.net/scratch-gui-projects/forum/3c76e33b-a151-4736-bf6c-15faa05fac8e.png" alt="6b93ef86-699b-4461-93d8-9dc7dbfb28b9-image.png" class=" img-responsive img-markdown" width="486" height="405" /></p>
<p dir="auto"> <br />
You can use the following code:</p>
<p dir="auto"><img src="https://cdncreaticodecom.b-cdn.net/scratch-gui-projects/forum/e9cc790f-9916-48b1-a16d-59fe6cdc0114.png" alt="92fe6a32-0f07-4d2c-9ced-bdab9f8d1879-image.png" class=" img-responsive img-markdown" width="974" height="372" /></p>
<p dir="auto"> <br />
 </p>
<h2><a class="anchor-offset" name="step-4-show-camera-preview"></a>Step 4 - Show camera preview</h2>
<p dir="auto"> </p>
<p dir="auto">When a button is clicked, the camera view will show up so the user can aim at the object or scene.</p>
<ul>
<li>The only difference between the two buttons is whether you use the front or back camera.</li>
<li>We’ll use the same camera widget name (“<strong>camera1</strong>”) for both options.<br />
 </li>
</ul>
<p dir="auto"><img src="https://cdncreaticodecom.b-cdn.net/scratch-gui-projects/forum/8cb8b488-1328-488e-a8f1-e51cd0f79a1d.png" alt="0ede859c-0d0f-49c5-8093-825d2e2a97d4-image.png" class=" img-responsive img-markdown" width="815" height="313" /></p>
<p dir="auto"> <br />
 </p>
<h2><a class="anchor-offset" name="step-5-prepare-to-take-a-picture"></a>Step 5 - Prepare to take a picture</h2>
<p dir="auto"> </p>
<p dir="auto">Besides showing the camera, we also need to get ready for the user to take a picture.</p>
<p dir="auto">To avoid repeating code, let’s make a custom block called “prepare to take picture”:</p>
<p dir="auto"><img src="https://cdncreaticodecom.b-cdn.net/scratch-gui-projects/forum/73d7e0de-87e1-4890-b831-34b7bdbd613f.png" alt="8deb6639-60a8-497a-b4f7-fe4897bd642b-image.png" class=" img-responsive img-markdown" width="809" height="464" /></p>
<p dir="auto"> <br />
Inside this custom block:</p>
<ul>
<li>Delete the two old buttons.</li>
<li>Add a new button to take a picture.<br />
 </li>
</ul>
<p dir="auto"><img src="https://cdncreaticodecom.b-cdn.net/scratch-gui-projects/forum/94c5e6cf-9c7f-465c-a051-5ffb0bb13a68.png" alt="ce8bd9ac-6500-4a4c-88ec-2d9649616ecf-image.png" class=" img-responsive img-markdown" width="984" height="335" /></p>
<p dir="auto"> <br />
Now the stage will look like this when getting ready to snap a photo:</p>
<p dir="auto"><img src="https://cdncreaticodecom.b-cdn.net/scratch-gui-projects/forum/9bd771b9-d5ee-43e5-9e13-6c7726d37951.png" alt="6cbf3772-ae0f-41b7-a5ed-47722983390d-image.png" class=" img-responsive img-markdown" width="493" height="372" /></p>
<p dir="auto"> <br />
 </p>
<h2><a class="anchor-offset" name="step-6-take-a-picture-and-show-it"></a>Step 6 - Take a picture and show it</h2>
<p dir="auto"> </p>
<p dir="auto">When the user clicks the “Take a picture” button, we will save the current camera view as a costume image named “c”. We will also remove all the widgets (the camera view and the button) so that the newly captured costume image is shown to the user:</p>
<p dir="auto"><img src="https://cdncreaticodecom.b-cdn.net/scratch-gui-projects/forum/e002377f-14ee-4840-8dae-83bd3c27831b.png" alt="31dd923b-69e8-4462-a8d1-dc8fbe0f545d-image.png" class=" img-responsive img-markdown" width="486" height="189" /></p>
<p dir="auto"> <br />
 </p>
<h2><a class="anchor-offset" name="step-7-prepare-for-the-user-question"></a>Step 7 - Prepare for the user question</h2>
<p dir="auto"> </p>
<p dir="auto">After taking the picture, add more blocks to:</p>
<ul>
<li>Create a new <strong>button</strong> for the user to <strong>ask a question</strong> using speech.</li>
<li>Add a <strong>textbox</strong> to show the recognized question.</li>
</ul>
<p dir="auto"><img src="https://forum.creaticode.com/plugins/nodebb-plugin-emoji/emoji/android/1f449.png?v=h4bqj8tg448" class="not-responsive emoji emoji-android emoji--point_right" title=":point_right:" alt="👉" /> Make the textbox background <strong>30% transparent</strong> so the captured costume image stays visible behind it.</p>
<p dir="auto"><img src="https://cdncreaticodecom.b-cdn.net/scratch-gui-projects/forum/f0299ad1-d7f1-4746-a2a2-fe0b2b02f11c.png" alt="3e36e3a2-82c6-4334-8ce3-4deb6a98d7e4-image.png" class=" img-responsive img-markdown" width="1180" height="412" /><br />
 <br />
Result:<br />
 </p>
<p dir="auto"><img src="https://cdncreaticodecom.b-cdn.net/scratch-gui-projects/forum/3ff270ac-7743-4d26-9ecd-b30bc6d30bde.png" alt="6d2508ce-6e56-479b-ad1d-ea953ee611cb-image.png" class=" img-responsive img-markdown" width="486" height="375" /></p>
<p dir="auto"> <br />
 </p>
<h2><a class="anchor-offset" name="step-8-recognize-the-user-s-question"></a>Step 8 - Recognize the user’s question</h2>
<p dir="auto"> </p>
<p dir="auto">When the user clicks “Ask a Question”:</p>
<ul>
<li>Start speech recognition for <strong>8 seconds</strong>, which should be long enough for most questions.</li>
<li>Show the recognized text inside the textbox.</li>
</ul>
<p dir="auto"><img src="https://cdncreaticodecom.b-cdn.net/scratch-gui-projects/forum/ab01e48d-b4d4-46a1-a0b4-e9fee3b71ff3.png" alt="3249a6ef-db5c-49fe-b819-dd234954bd6b-image.png" class=" img-responsive img-markdown" width="687" height="276" /></p>
<p dir="auto"> <br />
You can also use “continuous speech recognition”, and stop recognition when the user has completed a full sentence. To keep it simple, we will just use the time-based cutoff time.</p>
<p dir="auto">To test it, click the “Ask a question” button, and ask a question like “what is this?”, and then it will be recognized and displayed in the textbox:</p>
<p dir="auto"><img src="https://cdncreaticodecom.b-cdn.net/scratch-gui-projects/forum/a381c53f-bb89-4484-bdc1-466c040f720b.png" alt="9e6a4111-0c02-42e2-bcd6-ba9155e7e12c-image.png" class=" img-responsive img-markdown" width="491" height="379" /></p>
<p dir="auto"> <br />
 </p>
<h2><a class="anchor-offset" name="step-9-ask-ai-a-question-about-this-image"></a>Step 9 - Ask AI a question about this image</h2>
<p dir="auto"> </p>
<p dir="auto">Finally, we can <strong>send the picture and the question to the AI</strong>!</p>
<p dir="auto">You’ll need two blocks (the LLM block is wide, so it’s shown in two rows):</p>
<p dir="auto"><img src="https://cdncreaticodecom.b-cdn.net/scratch-gui-projects/forum/613208c7-30a1-48cd-8f7c-739b235e6310.png" alt="d5707c1d-36a5-43a8-be00-8fc26effb4f5-image.png" class=" img-responsive img-markdown" width="996" height="470" /></p>
<p dir="auto"><img src="https://cdncreaticodecom.b-cdn.net/scratch-gui-projects/forum/86d4f240-b72e-4992-84e3-249f3ae34c37.png" alt="175254db-e5db-493c-9b7d-750af1d91d9b-image.png" class=" img-responsive img-markdown" width="886" height="98" /></p>
<p dir="auto"> </p>
<p dir="auto">Here is how it works:</p>
<ol>
<li>
<p dir="auto">Attach the costume image “c” to the chat: this step will <strong>not</strong> send the image to the AI (LLM) yet. It only stores the image as part of the chat. You can attach more than one image to a chat session, but for this project, we only need to attach one image.</p>
</li>
<li>
<p dir="auto">Send a chat message to AI (LLM): this block will send out the prompt together with the image attached above. We will use a simple prompt: “Answer verbally in 50 words:\n”. The keyword “verbally” ensures the AI’s answer is conversational and not too formal. We are also limiting it to within 50 words to avoid lengthy answers.</p>
</li>
</ol>
<p dir="auto">After these 2 blocks run, the AI’s answer will be stored in the variable “result”.</p>
<p dir="auto"> <br />
 </p>
<h2><a class="anchor-offset" name="step-10-display-and-say-the-answer"></a>Step 10 - Display and say the answer</h2>
<p dir="auto"> </p>
<p dir="auto">Once the AI responds:</p>
<ul>
<li><strong>Show</strong> the answer in the textbox.</li>
<li><strong>Speak</strong> the answer out loud!</li>
</ul>
<p dir="auto">Also, make sure to <strong>stop any earlier speech</strong> when the user asks a new question.</p>
<p dir="auto"><img src="https://cdncreaticodecom.b-cdn.net/scratch-gui-projects/forum/7e225855-358f-42a1-b10b-a3c66021b9ee.png" alt="ee98c2fd-79ce-400e-8ba4-7fb3366f42c2-image.png" class=" img-responsive img-markdown" width="1019" height="670" /></p>
<p dir="auto"> <br />
The answer will look like this:</p>
<p dir="auto"><img src="https://cdncreaticodecom.b-cdn.net/scratch-gui-projects/forum/4afd17e9-9f12-4c10-83ca-21c9161faa3f.png" alt="b3ee2f44-e4e1-4720-a50a-622a83a559dd-image.png" class=" img-responsive img-markdown" width="484" height="365" /></p>
<p dir="auto"> <br />
 </p>
<h2><a class="anchor-offset" name="additional-challenges"></a>Additional Challenges</h2>
<p dir="auto"> </p>
<p dir="auto">This project demonstrates how to combine many useful AI tools into one simple app, but it is kept simple intentionally. Here are some ideas you can explore to enhance this tool further:</p>
<ul>
<li>
<p dir="auto"><strong>Handle follow-up questions:</strong><br />
Let users keep asking more questions about the same picture.<br />
Be careful not to re-attach the image again and again. Set the AI to “continue” mode for a smoother conversation.</p>
</li>
<li>
<p dir="auto"><strong>Smarter Speech Recognition:</strong><br />
Instead of waiting exactly 8 seconds, detect when the user finishes talking, or use start/stop buttons.</p>
</li>
<li>
<p dir="auto"><strong>Translate the Assistant:</strong><br />
Make it work in your native language!</p>
</li>
<li>
<p dir="auto"><strong>Customize the AI’s Behavior:</strong><br />
Adjust the prompt to give hints instead of direct answers (for example, for a homework helper version).</p>
</li>
</ul>
]]></description><link>https://forum.creaticode.com/topic/1594/an-assistant-that-can-see-and-talk-difficulty-3</link><generator>RSS for Node</generator><lastBuildDate>Wed, 11 Mar 2026 01:16:02 GMT</lastBuildDate><atom:link href="https://forum.creaticode.com/topic/1594.rss" rel="self" type="application/rss+xml"/><pubDate>Sat, 09 Nov 2024 03:51:52 GMT</pubDate><ttl>60</ttl></channel></rss>