Why this series?
I have a strong interest in both math and coding, and neural networks sit right at the intersection of the two. They are incredibly powerful architectures that may appear almost magical at first glance — but underneath, what’s really happening is a lot of fascinating mathematics and elegant ideas.
Since I usually take notes while studying, I decided to turn them into written explanations. This way I can both strengthen my own understanding and retention, and hopefully spark curiosity in others who might be interested in this field.
In this series, as I learn new concepts, I’ll write new blog posts that mix math and theoretical deep dives with practical coding examples. These posts are not meant to be a primary reference, but rather an accessible and engaging exploration. For this reason I am going to provide, down below, all the sources I have studied and consumed to deepen my knowledge on the topic.
Structure of the Series
This series will grow step by step as I deepen my understanding of neural networks. Each post focuses on a specific building block. For now, the planned chapters are:
- Introduction to Neural Networks
- Linear Regression with Neural Networks
- A Simple N-gram Model to Generate Italian Names
- Key Ideas Behind the Transformer Architecture
- Training a Small GPT on a Curated Dataset
Sources
Since this series is meant as a personal exploration rather than a formal reference, I’ll list here the materials I study along the way. These sources are diverse — books, academic papers, online courses, blog posts, and videos — so that anyone interested can follow the same path and dig deeper.
Books
- Deep Learning — Ian Goodfellow, Yoshua Bengio, Aaron Courville
- Neural Networks and Deep Learning — Michael Nielsen
- Dive into Deep Learning
- Concise Machine Learning — Jonathan Richard Shewchuk
Academic Papers
- A Neural Probabilistic Language Model — Bengio et al., 2003
- Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift — Ioffe & Szegedy, 2015
- Rethinking “Batch” in BatchNorm — Bjorck et al., 2021
- A Few Useful Things to Know about Machine Learning — Pedro Domingos, 2012
- Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification — He et al., 2015
- WaveNet: A Generative Model for Raw Audio — van den Oord et al., 2016
- Dropout: A Simple Way to Prevent Neural Networks from Overfitting — Srivastava et al., 2014
- Deep Residual Learning for Image Recognition — He et al., 2015
- Attention Is All You Need — Vaswani et al., 2017
Blogs & Articles
- All the Andrej Karpathy Blogs: Zero to Hero, Blog, Website
Videos
- 3Blue1Brown — Neural Networks series
- Yannic Kilcher — Paper walkthroughs (Transformers, GPTs, etc.)
- Andrej Karpathy — Neural Networks: Zero to Hero series
Conclusion
I hope you'll find something useful in this series and I encourage you to keep exploring this field!