Loading lesson…
Multi-modal AI takes more than just text — pictures, sound, and video too.
Old AI only read text. New 'multi-modal' AI can also look at pictures, listen to voice, or watch video.
If your AI app has a camera button, snap a photo of an object and ask 'What is this?'
Here's why "Some AI Can See Pictures and Hear Sound" matters: Learning about AI is one of the most important skills you can build for the future! Multi-modal AI takes more than just text — pictures, sound, and video too — and knowing how to apply this gives you a concrete advantage.
15 questions · take it digitally for instant feedback at tendril.neural-forge.io/learn/quiz/end-explorers-foundations-AI-and-the-multi-modal-input-r10a5
Which of these is an example of vision AI being used?
If you show an AI a photo of your pet and ask 'What is this?', what is the AI doing?
What is audio AI mainly used for?
A blind person uses an AI app that looks at photos and tells them what's in the picture. What kind of AI is this?
Which of these inputs can a multi-modal AI accept?
Why is it helpful that AI can now 'see' pictures?
What does the term 'multi-modal' mean when describing AI?
If an AI app has a camera button, what can you do with it?
Which of these is a real example of audio AI?
What is the main difference between old AI and new multi-modal AI?
A student photographs their homework with their phone and an AI reads the handwritten answers. What is happening?
Which of the following can a multi-modal AI accept as input?
If you wanted an AI to look at a painting and tell you what it shows, what type of AI would you need?
What happens when you talk to an AI instead of typing?
Why might someone use the camera feature on an AI app?