Neural Networks: Roadmap and Sources

Why this series?

I have a strong interest in both math and coding, and neural networks sit right at the intersection of the two. They are incredibly powerful architectures that may appear almost magical at first glance — but underneath, what’s really happening is a lot of fascinating mathematics and elegant ideas.

Since I usually take notes while studying, I decided to turn them into written explanations. This way I can both strengthen my own understanding and retention, and hopefully spark curiosity in others who might be interested in this field.

In this series, as I learn new concepts, I’ll write new blog posts that mix math and theoretical deep dives with practical coding examples. These posts are not meant to be a primary reference, but rather an accessible and engaging exploration. For this reason I am going to provide, down below, all the sources I have studied and consumed to deepen my knowledge on the topic.

Structure of the Series

This series will grow step by step as I deepen my understanding of neural networks. Each post focuses on a specific building block. For now, the planned chapters are:

Introduction to Neural Networks
Linear Regression with Neural Networks
From Bigrams to Neural Networks: The First Step in Language Modeling
Key Ideas Behind the Transformer Architecture
Training a Small GPT on a Curated Dataset

Sources

Since this series is meant as a personal exploration rather than a formal reference, I’ll list here the materials I study along the way. These sources are diverse — books, academic papers, online courses, blog posts, and videos — so that anyone interested can follow the same path and dig deeper.

Books

Deep Learning — Ian Goodfellow, Yoshua Bengio, Aaron Courville
Neural Networks and Deep Learning — Michael Nielsen
Dive into Deep Learning
Concise Machine Learning — Jonathan Richard Shewchuk

Academic Papers

A Neural Probabilistic Language Model — Bengio et al., 2003
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift — Ioffe & Szegedy, 2015
Rethinking “Batch” in BatchNorm — Bjorck et al., 2021
A Few Useful Things to Know about Machine Learning — Pedro Domingos, 2012
Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification — He et al., 2015
WaveNet: A Generative Model for Raw Audio — van den Oord et al., 2016
Dropout: A Simple Way to Prevent Neural Networks from Overfitting — Srivastava et al., 2014
Deep Residual Learning for Image Recognition — He et al., 2015
Attention Is All You Need — Vaswani et al., 2017

Blogs & Articles

All the Andrej Karpathy Blogs: Zero to Hero, Blog, Website

Videos

Conclusion

I hope you'll find something useful in this series and I encourage you to keep exploring this field!