Loading lessons…
Creators · Ages 14–17
The full LLM pipeline, agentic AI with OpenClaw + Ollama, subscription-tier literacy, and a real capstone.
Meet your guide: Atlas — a minimal octahedron
Your progress
Loading your progress…
Where should I start?
Chapters
Modules · 167
Before we can judge whether an AI is intelligent, we need a framework for what intelligence even means. Draw on Chollet, Dennett, and modern evals.
From raw bytes to deployed model, every ML system follows the same ten-stage pipeline. Master it and you can read any architecture paper.
Attention, positional encoding, residual streams. A walk through the architecture that powers every frontier language model today.
Data is the strategic asset of AI. Understand the supply chain, the legal fight, and the philosophical stakes before you build anything on top.
Dive into the equations that governed the last five years of AI progress, and the fresh questions they raise now that pure scaling is hitting walls.
Emergent abilities make AI both more exciting and more dangerous. How do labs forecast what the next model will do — and what happens when they are wrong?
The terminology ladder of AI capability is loaded. Clarify your definitions and you clarify your whole view of the field.
Writing software on top of an LLM is not like writing software on top of a database. Treat it as a stochastic system or it will bite you.
Open-source AI is both a technical movement and a political one. Understand the arguments so you can pick a stack and defend it.
Every AI breakthrough of the past decade rests on three interacting ingredients. Synthesize everything you have learned into one working model.
Alignment is not a vibes debate. It is a concrete technical problem about getting systems to pursue goals we actually want. Here is what researchers work on when they say they work on alignment.
The AI safety ecosystem is small, influential, and often misunderstood. Here is who does what, how they get funded, and how to tell real work from rhetoric.
Codex CLI is OpenAI's terminal coding agent. It runs locally, supports MCP, and ships a codex cloud mode for background tasks. Let's install it and compare it honestly to Claude Code.
Model Context Protocol is the most important open standard in agents. One protocol, 1,200+ servers, and your agent can plug into almost any system. Here's how it actually works.
Everything comes together. Design, code, test, secure, and ship a production-quality agent with open-source code you can fork today.
Flux Pro vs. Flux Dev. Midjourney vs. Stable Diffusion. The choice affects product architecture, cost, and what's possible. Here's the honest tradeoff.
Consent, deepfakes, fair use, democratization of creation. The hardest questions in this track don't have clean answers. Let's work through them honestly.
Claude Pro vs Max. ChatGPT Plus vs Pro. Gemini AI Pro vs Ultra. Stop guessing which plan you need. Here's the full map.
Subscription spend on AI can silently hit $100/mo. Learn the usage signals that mean upgrade, and the vibes that just mean temptation.
Going beyond the chat window. When you'd reach for the API, how pricing actually works, and how to start building. The API is where AI becomes a building block The consumer app is the most polished version of an AI experience.
Assemble the four or five AI tools that actually belong in your daily life. A tested template for the stack that earns its keep.
Claude Projects, ChatGPT Projects, Notion AI, Perplexity Spaces. How persistent context changes AI from search box to actual assistant.
Every major AI product has a privacy page you've never visited. Here's what to click, toggle, and delete to keep your data yours.
Brand loyalty is a liability in AI. Learn the muscle memory of switching models, the signals that say 'time to swap,' and the anti-lock-in habits.
Perplexity Comet is a full web browser that treats AI as a first-class citizen. It reads, summarizes, and acts on pages you visit.
Opus 4.7 shipped in April 2026 with a bigger thinking budget and a 1M-token window at standard prices. Here is the architecture, the pricing math, and when the premium is actually worth it.
Grok Vision rounds out xAI's lineup. It is not the strongest visual model, but it has a niche around uncensored scene description and real-time X media.
Qwen 3 VL punches above its weight on vision benchmarks and opens weights for self-hosted OCR and doc AI.
Kimi's Research Mode plans, browses, and synthesizes across dozens of sources. Here is how to get the most out of it.
Black Forest Labs offers three Flux tiers. Schnell is free-speed, Pro is the paid flagship. Here is when each wins.
Flux Dev is the LoRA-friendly middle tier of the Flux family. Here is how to train a style on your own art without renting a farm.
Niji is Midjourney's anime-specialist model. Here is how to prompt it and when it beats general Midjourney for stylized art.
SDXL Turbo renders in a single step. That unlocks interactive, typing-to-image experiences you cannot build on slower models.
ElevenLabs v3 clones a voice from seconds of audio. Here is what to build, what to avoid, and how to stay on the right side of consent.
Calculus is where a lot of smart students hit a wall. Wolfram|Alpha and Claude can walk you through every step, but only if you already did the setup work.
AP Bio has roughly a thousand terms and four big concepts. NotebookLM and Claude Projects can turn your textbook into a custom tutor that actually knows what you are studying.
AP Chem punishes careless unit-tracking and rewards practice. AI tools that show every step are perfect for catching where your dimensional analysis went sideways.
Physics problems are 40 percent drawing the right picture. AI models that can see your free-body diagram and critique it are close to having a TA on call.
Debate rewards knowing the other side's best argument better than they do. AI is built for exactly this kind of fast, balanced research.
Literature review in minutes, protein structures on demand, AI-proposed drug candidates. The discovery cycle has compressed — but the human posing the question still sets the direction.
Fusion generative design explores millions of topology options. nTopology and Ansys simulate in hours what used to take weeks. The ME still owns manufacturability.
NVIDIA GR00T, Physical Intelligence π0, and Figure Helix took the vision-language-action paradigm from research paper to factory floor. This is the hottest hardware-software frontier.
Harvey and CoCounsel research case law, draft briefs, and summarize depositions. The paralegal-and-first-year tier of the profession is genuinely shrinking. The judgment tier is thriving. What AI touches Legal research — Lexis+ AI, Westlaw Precision, Paxton AI, vLex Vincent search and synthesize case law.
The role has inverted: paralegals who used to do research and doc prep now direct the AI that does it. The job is not gone — but it is changing faster than any legal role.
AlphaSense, Hebbia, and Bloomberg GPT read every filing before you do. The edge is the question you ask and the thesis you write.
McKinsey Lilli, Gamma, and Claude generate first-draft slides and research in minutes. The real consulting work — client relationships and implementation — is more human than ever.
v0, Linear AI, and Dovetail synthesize research, draft PRDs, and ship prototypes in hours. The PM role has leveled up from communicator to quasi-builder.
Species identification from underwater footage used to take a season. A model trained on 8 million fish does it in a single afternoon.
Generative imagery, 3D garment sim, and on-demand pattern-making have collapsed the front end. Taste is still the scarce resource.
AI runs the research and drafts the decks. The strategist still has to decide what a brand means.
Wildfire detection, wildlife cameras, and visitor demand modeling changed the job. The ranger still walks the trail at dawn.
Cursor forked VS Code and rebuilt it around AI. It's now the de facto AI IDE for serious engineers. Deep dive on what makes it different, the Composer agent, and the $500/month enterprise pricing.
Windsurf (from Codeium, acquired by OpenAI in 2025) competes with Cursor via Cascade, its autonomous agent. Deep look at where it's ahead, where it's behind, and the post-acquisition future.
Claude Code runs in your terminal, operates on your actual file system, and treats your whole repo as context. Deep look at why senior engineers prefer it to IDE-based AI.
Codex CLI is OpenAI's open-source terminal coding agent. Look at how it compares to Claude Code, what it does uniquely, and why it matters to non-Anthropic shops.
Zed is a Rust-native code editor that integrates AI collaboration and pair-coding at the architecture level. Look at its strengths as a lightweight Cursor alternative.
Figma's AI features (First Draft, Make Designs, Rename Layers) bring generative design to the industry standard. Deep dive on what it's changed and what's still a gimmick.
Framer's AI turns a prompt into a publishable website with real code. Look at who's using it to ship portfolios and small-biz sites in 2026.
Recraft focuses on style consistency, vector output, and brand workflows — things Midjourney still ignores. Deep dive on why designers and marketers are switching.
Galileo AI (now part of Google) generates high-fidelity UI mockups from prompts. Look at the acquisition, what happened to the product, and current Google Stitch equivalence.
Uizard turns hand-drawn sketches, screenshots, and prompts into editable UI mockups. Look at whether its 2026 AI upgrades make it a real Figma alternative.
Runway Gen-4 generates cinematic AI video from prompts. Deep look at its industrial-strength features, why studios use it, and the ethical firestorm around it.
ElevenLabs generates synthetic voices indistinguishable from human recordings. Deep dive on voice cloning, dubbing, the consent-and-ethics story, and pricing realities.
Suno generates full songs — vocals, instruments, lyrics — from a text prompt. Deep dive on what it sounds like, the industry lawsuits, and whether it's a toy or a tool.
Descript revolutionized podcast editing by making audio editable as text. Deep dive on Overdub voice cloning, Studio Sound, and the serious 2025 updates. Studio Sound — one-click AI noise reduction that makes laptop recordings sound studio-quality.
Pika Labs built a viral AI video product aimed at creators, not studios. Compare it to Runway and look at where it fits in 2026.
Writer is a full-stack enterprise AI platform with its own models (Palmyra), strict governance, and deep integrations. Look at who chooses it over ChatGPT Enterprise.
Sudowrite is purpose-built for fiction writers. Deep dive on its Story Bible, Brainstorm, Describe, and Expand tools — and why novelists pay $25/month when ChatGPT is cheaper.
ShortlyAI was one of the first GPT-3 writing apps, now owned by Jasper. Look at whether the stripped-down approach still makes sense in 2026.
Zapier built the integration platform that connects 7,000+ apps. Zapier Agents and Zapier Central are its attempt to add AI agents on top. Deep look at where it works and where it breaks.
Motion schedules your tasks into your calendar automatically, rescheduling as priorities change. Look at whether it actually improves productivity or just feels busy.
Reclaim schedules tasks and protects habits on your calendar, but with a gentler touch than Motion. Look at why some users prefer it.
Superhuman was famous for fast email before AI. Now it bundles AI replies, auto-drafting, and AI calendar. Deep look at whether it's worth the premium.
ClickUp is project management, docs, goals, and chat all in one. ClickUp AI is its answer to Notion AI. Look at what it does inside the ClickUp ecosystem.
Consensus searches 200M+ academic papers and gives evidence-based answers. Deep look at how researchers use it, what it does differently from Perplexity, and its limits.
Elicit automates slow parts of academic research: finding papers, extracting data, building literature matrices. Look at what it saves PhDs 20 hours a week.
Gong records, transcribes, and analyzes every sales call to surface what works. Deep dive on what Gong actually does, the 'deal intelligence' features, and why it's $1,500+/seat/year.
Clay scrapes, enriches, and personalizes at scale for sales and marketing. Deep look at what it does, the Claygent agent, and pricing that starts at $149/month.
Lindy builds AI agents that do jobs: handle email, qualify leads, schedule meetings. Deep dive on what it actually delivers vs the marketing.
Vic.ai autonomously processes invoices, codes transactions, and speeds up AP teams. Deep look at what CFOs are buying and where it fails.
Harvey is the AI legal platform deployed at top law firms worldwide. Deep dive on what it does, why firms pay six-figures for seats, and the 2026 competitive landscape.
Classes let you bundle data with the behavior that operates on it. You'll build a class for a real thing and use AI to refactor it with confidence.
What alignment actually is as a research program, how it is done in practice, what the open problems are, and where the actual papers live. A model that is always helpful will help you do harmful things.
A deep tour of the canonical examples, Goodhart's Law, and why specification gaming is not a bug but a structural property of optimization. That is Goodhart's Law, originally formulated in monetary policy and now the most-cited one-liner in AI safety.
What a constitution actually contains, how the training loop works, where the research is now, and the honest trade-offs.
Debate, amplification, weak-to-strong, process supervision. Research on how humans supervise models smarter than them.
Sparse autoencoders, features, circuits. How researchers try to see what a model actually thinks, and why it may be the most strategically important safety work.
If you query a closed model enough, you can sometimes reconstruct it. Here is the research on extraction attacks and what it means for proprietary AI.
AI agents can already find some software vulnerabilities and write exploits. What happens when those capabilities scale? A clear-eyed walk through the data.
Four benchmarks dominate modern AI announcements. Know what each measures, how, and where it breaks.
The world's most influential 'leaderboard' for AI is not a test — it is humans voting blindly. Here is how that works.
Born in chess, now everywhere in AI evaluation. Learn why Elo works and where it quietly misleads.
Why the benchmark that was state-of-the-art three years ago is now useless — and what that teaches about measuring AI.
When the test questions quietly end up in the training data, scores lie. Here is how it happens and how to catch it.
Public benchmarks get gamed. Private evaluations tell the truth but cannot be checked. Where is the balance? Third-party evaluators Organizations like METR (formerly ARC Evals) and the UK AI Safety Institute run closed evaluations on frontier models.
LLM benchmarks are about single answers. Agent benchmarks measure multi-step real-world task completion. Very different beast.
Evaluating models that see, hear, and read at once requires new kinds of tests. Here are the ones that matter.
Leaderboards are compelling. They are also deeply misleading. Here is a checklist for real skepticism. In reality, leaderboards hide a stack of choices that can swing the ordering: prompt wording, sampling settings, number of attempts, which subset of the benchmark is reported.
Using one LLM to grade another is the cheapest human-like evaluation you can run. It is also full of traps.
The eval that matters most is the one tied to your real task. Here is a step-by-step way to build one. The rubric is the product Most 'AI product' failures are actually rubric failures.
A golden dataset is a curated set of hard, representative examples you trust completely. It is the backbone of every serious eval.
Prompts are code. Code needs tests. Here is how to stop silently breaking your system each time you tweak a prompt.
A model that says 'I am 95 percent sure' and is wrong 40 percent of the time is miscalibrated. Measuring that gap is uncertainty quantification.
A calibrated model's 70 percent means it is right 70 percent of the time. Most LLMs are not calibrated. Here is what that costs you.
Benchmarks measure what you ask. Red-teaming measures what breaks. Learn to test for failure modes, not capabilities. For AI, red teams probe for harmful outputs, jailbreaks, bias, leakage of training data, and dangerous capabilities.
Asking 'can the model do it?' and 'will doing it cause harm?' are different questions. Both matter.
AI is amazing at things that should be hard and terrible at things that should be easy. That jaggedness is the key to using it well.
Sometimes a network memorizes, then — long after you would have stopped training — suddenly generalizes. That is grokking, a real and weird phenomenon. Why it matters beyond the toy Grokking suggests that 'more training' can sometimes qualitatively change a model's behavior — not just improve a score but switch to a different algorithm internally.
Some capabilities grow smoothly with scale. Others seem to appear out of nowhere. Telling them apart is a whole research program. The Big Question Is AI capability a smooth climb or a staircase?
Models trained on one task can often do many others. Understanding why is one of the deepest lessons in modern ML.
Show a model three examples, and it learns the task on the spot — without any weight updates. This is one of the strangest properties of transformers.
Asking a model to 'think step by step' makes it better at hard problems. Here is why, and when it fails.
LLMs are black boxes with billions of parameters. Why is interpretability so hard — and what progress has been made?
AI turns weeks of literature review into days — if you know how to use it. Here is a workflow that actually works.
AI moves so fast that staying current is its own skill. Here is a sustainable system.
NotebookLM turns a pile of PDFs into a searchable, askable brain. Here is how to build a research notebook that keeps paying dividends.
The norms for disclosing AI use in research are still being written. Here is the emerging consensus and how to stay on the right side of it.
The best way to truly understand an AI claim is to try it yourself. Here is how to run a small experiment that actually teaches you something.
An experiment you do not write up is an experiment you will forget. Here is how to write a small findings post people will actually read. That means exact prompts, model versions, dates, and the raw CSV.
Real data is expensive, private, or scarce. Synthetic data is generated by models themselves. It is rapidly becoming as important as scraped data.
Behind every supervised model is an army of human labelers. Understanding how labeling works is understanding who really builds AI.
The old mantra was more data always wins. The new reality is more complicated. Sometimes a small, hand-crafted dataset beats a giant messy one.
A data card is like a nutrition label for a dataset: who collected it, how, what is in it, and what it should not be used for.
If your training data is 90 percent men, your model will work worse for women. Representation bias is the most pervasive issue in AI.
Measurement bias happens when the thing you measure is a flawed stand-in for what you actually care about. It is subtle and surprisingly common.
Even accurate data can encode an unjust history. The COMPAS recidivism tool shows what happens when AI learns from a biased past.
Every labeled dataset has mistakes. Studies have found error rates of 3 to 6 percent in famous benchmarks like ImageNet. Noisy labels confuse models and mislead evaluations.
If two reasonable humans cannot agree on a label, neither can a model. Inter-annotator agreement tells you if a task is even well-defined.
Small populations get hurt first when datasets are built carelessly. Fixing this requires intentional collection, not just better algorithms.
AI has a geography problem. Training data over-represents North America and Europe, and it shows in subtle and not-so-subtle ways.
English is 6 percent of the world's speakers but 50+ percent of the training data. This asymmetry shapes every model we use.
A data audit is a structured process to find bias, errors, and ethical issues before a model goes live. Every creator should know how.
Everyone wants to debias AI. But the literature is full of methods that look good on paper and fail in the wild. Here is the honest scorecard.
Saying the average is 50,000 dollars can mean three different things. Picking the wrong kind of average is how statistics starts lying to you.
Mean tells you the center. Variance and standard deviation tell you the spread. Without both, you are missing half the story.
Data comes in shapes. The shape determines which tools you can use, and which assumptions will silently betray you.
Some things grow multiplicatively, not additively. Log scales reveal patterns that linear scales hide, especially for anything related to scale or growth.
A trend that appears in every subgroup can reverse when you combine the groups. This is Simpson's Paradox, and it hides in plain sight.
A single weird value can distort your entire analysis. But outliers are also where the most interesting stories live. Knowing when to remove them is an art.
Resampling techniques draw new samples from your data to estimate uncertainty, balance classes, or validate models. It is one of the most underused superpowers in statistics.
Bootstrapping estimates the uncertainty of any statistic, even when you have no clean mathematical formula. It is simple, powerful, and surprisingly deep.
Ownership of data is not one question but a tangle of rights: copyright, contract, privacy, and control. Untangling them is essential for responsible use.
Violating a website's Terms of Service and violating copyright are different legal problems. Understanding the distinction is critical for data work. Fair use in training The argument AI companies make is that training is transformative fair use.
Europe's General Data Protection Regulation (2018) reshaped how the world handles personal data. Understanding its core concepts is now essential. In 2023, Italy briefly banned ChatGPT over GDPR concerns.
Thousands of companies you have never heard of trade your personal data every second. Understanding this invisible market is understanding modern privacy. Brokers and AI training Much training data for specialized models (ad targeting, credit scoring, risk assessment) comes from brokers.
Many AI companies now offer opt-outs from training. But how well do they actually work, and what are the catches?
A 30-year-old simple text file, robots.txt, is how the web has tried to regulate crawlers. The new ai.txt proposal aims to refine this for the AI era.
If you build a dataset, how you license it determines who can use it and how. Picking the right license matters as much as the data itself.
Removing names does not make data anonymous. Combinations of a few seemingly innocent fields can re-identify nearly anyone.
A complete walkthrough from question to shareable dataset. The first project is the hardest; this lesson gets you to the other side.
Jupyter is the data scientist's notebook. Code, output, and narrative in one document. Learning Jupyter well pays dividends for every future project.
Pandas is the Python library that made data science what it is today. Ten verbs get you through 90 percent of day-to-day data work.
These two formats are the bread and butter of data interchange. Handling them well means handling edge cases well.
Creating a dataset from scratch teaches you more than using someone else's. Here is how to build a high-quality small labeled dataset for a real task.
Hugging Face Hub is the GitHub of AI data and models. Uploading a dataset there makes it instantly accessible to millions of practitioners.
Claude Shannon turned communication into mathematics and gave AI the substrate it would need.
In 1973, a British mathematician wrote a report that gutted UK AI funding for a decade.
Rumelhart, Hinton, and Williams published the algorithm that would eventually power everything.
In September 2012, a neural network crushed ImageNet and everything about AI changed.
A 2015 paper from Microsoft Research let neural networks go 150 layers deep by adding a shortcut.
Eight Google authors replaced recurrence with attention and quietly launched the modern AI era.
In 2020, a 175 billion parameter model and a parallel paper on scaling laws redefined what bigger could mean.
A 1980 thought experiment asked whether symbol manipulation alone could ever amount to real understanding.
Looking at AI's full history reveals rhythms that help make sense of the present moment.
A feature is a direction in activation space that corresponds to a concept. Finding them — naming them, ranking them, connecting them — is one of the central activities of interpretability research.
A lot of civics class is pretending you read the news. AI makes it possible to actually understand a bill, a court case, or a political ad in under ten minutes.
AI writes Java for you faster than your teacher can say 'Scanner'. Using it without cheating yourself out of the class is the real skill.