The premise
Small models running locally trade peak quality for privacy, offline capability, and zero per-call cost.
What AI does well here
- Privacy-sensitive text processing
- Offline summarization and classification
- Local autocomplete and quick assistants
- Edge devices and mobile apps
What AI cannot do
- Match frontier models on hard reasoning
- Handle very long contexts comfortably
- Replace cloud models for ambiguous, complex prompts
- Stay current — they don't auto-update
End-of-lesson check
15 questions · take it digitally for instant feedback at tendril.neural-forge.io/learn/quiz/end-model-families-AI-tiny-models-on-device-r13a3-creators
What approximate parameter count enables AI models to run on typical consumer laptops and phones?
- 100 billion parameters
- 1 trillion parameters
- 500 million parameters
- 4 billion parameters
What is the primary trade-off when using compact on-device AI models instead of large cloud-based models?
- Trading speed for larger context windows and better reasoning
- Trading memory usage for access to the internet
- Trading accuracy for compatibility with older smartphones
- Trading some quality for privacy, offline use, and no per-call costs
Which of these is identified as a suitable use case for on-device AI models?
- Analyzing a 500-page legal document for nuanced arguments
- Classifying emails as spam or not spam in real-time
- Writing long-form novels with complex character arcs
- Answering ambiguous philosophical questions
A student wants to use an on-device model to solve a complex logic puzzle that requires multi-step reasoning. What limitation will they encounter?
- The model will automatically update itself to improve performance
- The model will require an internet connection to access additional data
- The model cannot match frontier models on hard reasoning tasks
- The model will refuse to run on battery power
A user installs an on-device AI app that claims to be fully private. Under what condition might their data still not be truly private?
- If the operating system or app still transmits data to external servers
- If they use the model while wearing headphones
- If they have more than 10GB of free storage
- If they use the model while charging their device
In which scenario would using a tiny on-device model make the most sense compared to calling a large cloud API?
- When summarizing meeting notes offline during a flight with no WiFi
- When translating a 50,000-word document in real-time
- When generating creative fiction with nuanced character development
- When needing the absolute highest accuracy on a critical medical diagnosis
What is quantization in the context of on-device AI models?
- A process for connecting models to cloud servers
- A method to increase model accuracy by adding more parameters
- A technique to compress model size by reducing numerical precision
- A security protocol that encrypts model outputs
An AI application needs to work inside a smart refrigerator that has no internet connection. Which model type would be most appropriate?
- A cloud-based model with streaming capabilities
- A small on-device model quantized for edge deployment
- A model requiring 100 billion parameters
- A frontier model accessed via API
What cost advantage do on-device models offer compared to cloud-based APIs?
- They cost less per user but more per interaction
- They are free to use forever with no limitations
- They eliminate per-call or per-token costs once the model is downloaded
- They require no initial investment but charge annual fees
A developer wants to process a 200-page document through an on-device model for analysis. What limitation will likely prevent this?
- On-device models require subscription fees for long documents
- On-device models cannot process text at all
- On-device models have limited context window handling
- On-device models cannot be used for analysis tasks
How do on-device models stay current with new knowledge compared to cloud models?
- They learn from user interactions over time
- They sync with search engines in real-time
- They automatically fetch updates whenever connected to WiFi
- They do not auto-update and require manual model replacement
Which two model families are specifically named in the lesson as examples of small on-device models?
- GPT and Claude
- Phi and Gemma
- Llama and Mistral
- BERT and RoBERTa
What type of device would most benefit from running a tiny AI model locally rather than calling a cloud API?
- A smartphone app requiring offline语音 assistance
- A supercomputer running scientific simulations
- A desktop workstation with multiple GPUs
- A server cluster in a data center
A developer needs to automatically categorize support tickets by topic. Why might a tiny on-device model be suitable for this task?
- Because classification is a simple, well-defined task tiny models can handle
- Because classification needs the latest knowledge from the internet
- Because classification requires complex multi-step reasoning
- Because classification requires emotional intelligence
What functionality would a tiny on-device model likely NOT be able to provide effectively?
- Answering vague or ambiguous questions about abstract topics
- Performing basic spelling and grammar checks
- Generating short replies based on context
- Quick text autocomplete in a messaging app