Intermediate Course
Transformer Architecture Intermediate
49 lessons across 7 chapters. Every lesson is standalone — start anywhere.
49 lessons 7 chapters
1 Scaling Laws 7 lessons
2 Modern Architecture Improvements 7 lessons
3 Mixture of Experts Architecture 7 lessons
4 Context Window Architecture 7 lessons
5 Training vs Inference Architecture 7 lessons
6 Architecture Comparisons 7 lessons
7 Common Misconceptions 7 lessons
1
Transformers do not have memory: correction
2 Bigger context equals better understanding: nuance
3 Attention equals the model understands: correction
4 Parameters equal intelligence: nuance
5 All tokens treated equally: correction
6 Training equals memorization: the nuance
7 Architecture vs training data: what matters more