Loading lesson…
AI did not start in 2022. It has decades of wrong turns and breakthroughs. Knowing the history helps you spot hype from real progress.
AI research began in the 1950s. The field has gone through booms and winters — periods of huge funding followed by collapse. Understanding this rhythm helps you calibrate today's excitement.
Early researchers believed intelligence was logic. They wrote programs that manipulated symbols according to formal rules. Expert systems of the 1980s, like MYCIN for medical diagnosis, encoded human expert knowledge as if-then rules. They worked for narrow problems but crumbled outside them.
Researchers pivoted to data-driven methods. Support vector machines, decision trees, and shallow neural networks dominated. IBM's Deep Blue beat Kasparov at chess in 1997, but it was hand-tuned search, not general intelligence.
In 2012, a neural network called AlexNet won the ImageNet competition by a huge margin, kicking off the deep learning revolution. GPUs, big datasets, and backpropagation combined to finally make deep networks trainable. By 2016, AlphaGo beat the world champion at Go, a feat nobody thought was close.
The 2017 paper Attention Is All You Need introduced the transformer architecture. It replaced the recurrent networks used for language with a simpler, more parallel structure. Every modern LLM — GPT, Claude, Gemini, Llama — is a transformer at heart.
| Year | Milestone |
|---|---|
| 2017 | Transformer architecture published |
| 2018 | BERT and GPT-1 released |
| 2020 | GPT-3 shows few-shot learning |
| 2022 | ChatGPT makes AI mainstream |
| 2023-2024 | GPT-4, Claude 3, Gemini, open-source Llama |
| 2025-2026 | Reasoning models, multimodality, agentic systems |
AI winters end not with a new theory, but with enough compute.
— A long-time researcher
The big idea: today's AI is the fourth major wave, built on GPUs, internet-scale data, and the transformer. Knowing the cycle helps you see that hype is not new, but neither is real progress.
15 questions · take it digitally for instant feedback at tendril.neural-forge.io/learn/quiz/end-builders-history-of-ai
What is the core idea behind "A Short History: From Expert Systems to Transformers"?
Which term best describes a foundational idea in "A Short History: From Expert Systems to Transformers"?
A learner studying A Short History: From Expert Systems to Transformers would need to understand which concept?
Which of these is directly relevant to A Short History: From Expert Systems to Transformers?
Which of the following is a key point about A Short History: From Expert Systems to Transformers?
What is the key insight about "Why GPUs mattered" in the context of A Short History: From Expert Systems to Transformers?
What is the recommended tip about "Build your mental model" in the context of A Short History: From Expert Systems to Transformers?
Which statement accurately describes an aspect of A Short History: From Expert Systems to Transformers?
What does working with A Short History: From Expert Systems to Transformers typically involve?
Which of the following is true about A Short History: From Expert Systems to Transformers?
Which best describes the scope of "A Short History: From Expert Systems to Transformers"?
Which section heading best belongs in a lesson about A Short History: From Expert Systems to Transformers?
Which section heading best belongs in a lesson about A Short History: From Expert Systems to Transformers?
Which section heading best belongs in a lesson about A Short History: From Expert Systems to Transformers?
Which section heading best belongs in a lesson about A Short History: From Expert Systems to Transformers?