best way to learn about transformers

share

Summary of results

GPT-4o
Tip: click on links to see relevant comments

Introductory Resources

Written Guides and Tutorials

Foundational Papers

Advanced and Specialized Resources

Practical Implementation

Additional Tips

1.

This Intro to Transformers is helpful to get some basic understanding of the underyling concepts and it comes with a really succint history lesson as well. https://www.youtube.com/watch?v=XfpMkf4rD6E

3.

There are millions of "Transformers Explained" blog posts by now. The one I got the most out of is "Transformers from Scratch" by Peter Bloem:

http://peterbloem.nl/blog/transformers

4.

For a while now, an answer I've seen is to start with "Attention Is All You Need", the original Transformers paper. It's still pretty good, but over the past year I've led a few working sessions on grokking transformer computational fundamentals and they've turned up some helpful later additions that simplify and clarify what's going on.

You can quickly get overwhelmed by the million good resources out there so I'll keep it to these three. If you have a strong CS background, they'll take you a long way:

(1) Transformers from Scratch: https://peterbloem.nl/blog/transformers

(2) Attention Is All You Need: https://proceedings.neurips.cc/paper/2017/hash/3f5ee243547de...

(3) Formal Algorithms for Transformers: https://arxiv.org/abs/2207.09238

5.

I found this article on “transformers from scratch”[0] to be a perfect (for me) middle ground between high level hand-wavy explanations and overly technical in-the-weeds academic or code treatments.

[0] https://e2eml.school/transformers.html

6.

While it is largely obsolete for practical purposes, learning about them is still valuable as they illustrate the natural evolution in the thought process behind the development of transformers.

7.

To be honest, I'd start with some introduction to Transformer YouTube videos. They'll cover a lot of these terms and you'll then have a better understanding to find additional resources.

8.

So how practical is learning to create your own transformers if you can't afford a giant amount of resources to train them?

9.

> Transformer learning explained

Well, "explained" seems to be a stretching term; I would rather call it a mathematical derivation of the operation of a transformer which is certainly interesting for some specialists.

10.

This link was posted here recently, and was the most understandable explanation I've found so far: https://e2eml.school/transformers.html

11.

An early explainer of transformers, which is a quicker read, that I found very useful when they were still new to me, is The Illustrated Transformer[1], by Jay Alammar.

A more recent academic but high-level explanation of transformers, very good for detail on the different flow flavors (e.g. encoder-decoder vs decoder only), is Formal Algorithms for Transformers[2], from DeepMind.

[1] https://jalammar.github.io/illustrated-transformer/

[2] https://arxiv.org/abs/2207.09238

12.

The best way to understand transformers is to take Andrej’s Karpathy course on youtube. With a keyboard and a lot of focus time.

13.

I remember looking into this article. It was really helpful for me to understand transformers. Although the OP's article is detailed, this one is concise. Here's the link: https://blue-season.github.io/transformer-in-5-minutes

14.

To be honest, for transformers just go to huggingface.co and see what interests you. They have tons of examples to run and they also link to all the papers in the documentation. It doesn't get much easier to get into it. Even for the more recent stuff like vision transformers and diffusion models.

15.

I'm the author of https://jalammar.github.io/illustrated-transformer/ and have spent years since introducing people to Transformers and thinking of how best to communicate those concepts. I've found that different people need different kinds of introductions, and the thread here includes some often cited resources including:

https://peterbloem.nl/blog/transformers

https://e2eml.school/transformers.html

I would also add Luis Serrano's article here: https://txt.cohere.com/what-are-transformer-models/ (HN discussion: https://news.ycombinator.com/item?id=35576918).

Looking back at The Illustrated Transformer, when I introduce people to the topic now, I find I can hide some complexity by omitting the encoder-decoder architecture and focusing only on one. Decoders are great because now a lot of people come to Transformers having heard of GPT models (which are decoder only). So for me, my canonical intro to Transformers now only touches on a decoder model. You can see this narrative here: https://www.youtube.com/watch?v=MQnJZuBGmSQ

16.

Those Computerphile videos[0] by Rob Miles helped me understand transformers. He specifically references the "Attention is all you need" paper.

And for a deeper dive, Andrej Kharpaty has this hands-on video[1] where he builds a transformer from scratch. You can check-out his other videos on NLP as well they are all excellent.

[0] https://youtu.be/rURRYI66E54, https://youtu.be/89A4jGvaaKk

[1] https://youtu.be/kCc8FmEb1nY

17.

Jay alammar's Illustrated transformer, although this too is detailed. I think it's still worth taking a look, because really i don't think that people have yet "compressed" what transformers do intuitively. None of the concepts of the networks are particularly hard math - it's basic algebra. But the overall construction is complicated.

https://jalammar.github.io/illustrated-transformer/

18.

Here's my attempt at a simple explanation of transformers. I would love feedback on whether I've got it right and how I could improve it. Cheers

19.

For those that want a high level overview of Transformers, we recently covered it in our podcast: https://www.youtube.com/watch?v=Kb0II5DuDE0

20.

Would this teach transformers? Or is that something else?

Also any tips for finding a study group for learning the large language models? I can’t seem to self motivate.

21.

Everytime I need a refresher on transformers, I read the same author's post on transformers. Looking forward to this one!

23.

For specifically understanding transformers, this (w/ maybe GPT-4 by your side to unpack jargon/math) might be able to get you from lay-person to understanding enough to be dangerous pretty quickly: https://sebastianraschka.com/blog/2023/llm-reading-list.html

24.

Without animated visuals, I don't think any non-math/non-ML person can ever get a good understanding of transformers.

You will need to watch videos.

Watch this playlist and you will understand: https://youtube.com/playlist?list=PLaJCKi8Nk1hwaMUYxJMiM3jTB...

Then watch this and you will understand even more: https://youtu.be/g2BRIuln4uc

Finally, watch this playlist: https://youtube.com/playlist?list=PL86uXYUJ7999zE8u2-97i4KG_...

25.
26.

If you'd rather prefer something readable and explicit, instead of empty handwaving and uml-like diagrams, read "The Transformer model in equations" [0] by John Thickstun [1].

[0] https://johnthickstun.com/docs/transformers.pdf

[1] https://johnthickstun.com/docs/

27.

besides everything that was mentioned here, what made it finally click for me early in my journey was running through this excellent tutorial by Peter Bloem multiple times https://peterbloem.nl/blog/transformers highly recommend

28.

The Illustrated Transformer is pretty great. I was pretty hazy after reading the paper back in 2017 and this resource helped a lot.

https://jalammar.github.io/illustrated-transformer/

29.

I thought I understood transformers well, even though I had never implemented them. Then one day I implemented them, and they didn't work/train nearly as well as the standard pytorch transformer.

I eventually realized that I had ignored the dropout, because I thought my data could never overfit. (I trained the transformer to add numbers, and I never showed it the same pair twice.) Turns out dropout has a much bigger role than I had realized.

TLDR, just go and implement a transformer.

The more from scratch the better.

Everyone I know who tried it, ended up learning something they hadn't expected.

From how training is parallelized over tokens down to how backprop really works.

It's different for every person.

30.

I wanted to talk about what powers LLMs, which I believe is important. The answer to that is transformers. While I may not have delved deeper into how a transformer actually works, I tried to explain the concepts in the simplest way possible.

31.

It's also important to learn how to "teach yourself".

Understanding transformers will be really hard if you don't understand basic fully connected feedforward networks (multilayer perceptrons). And learning those is a bit challenging if you don't understand a single unit perceptron.

Transformers have the additional challenge of having a bit weird terminology. Keys, queries and values kinda make sense from a traditional information retrieval literature but they're more a metaphor in the attention system. "Attention" and other mentalistic/antrophomorphic terminology can also easily mislead intuitions.

Getting a good "learning path" is usually a teacher's main task, but you can learn to figure those by yourself by trying to find some part of the thing you can get a grasp of.

Most complicated seeming things (especially in tech) aren't really that complicated "to get". You just have to know a lot of stuff that the thing builds on.

32.

karpathy gave a good high-level history of the transformer architecture in this Stanford lecture https://youtu.be/XfpMkf4rD6E?si=MDICNzZ_Mq9uzRo9&t=618

33.

Is it only me, or after reading this article with a lot of high-level, vague phrases and anecdotes - skipping the actual essence of many smart tricks making transformers computationally efficient - it is actually harder to grasp how transformers “really work”.

I recommend videos from Andrej Karpathy on this topic. Well delivered, clearly explaining main techniques and providing python implementation

34.

I endorse all of this and will further endorse (probably as a follow-up once one has a basic grasp) "A Mathematical Framework for Transformer Circuits" which builds a lot of really useful ideas for understanding how and why transformers work and how to start getting a grasp on treating them as something other than magical black boxes.

https://transformer-circuits.pub/2021/framework/index.html


Terms & Privacy Policy | This site is not affiliated with or sponsored by Hacker News or Y Combinator
Built by @jnnnthnn