Get startedGet started for free

Quiz 2 - Question 2

Consider a transformer model that uses 8 attention heads. If the embedding dimension is 512, what is the usual dimension of the output vector of each head?

This exercise is part of the course

Google DeepMind: Discover The Transformer Architecture

View Course

Hands-on interactive exercise

Turn theory into action with one of our interactive exercises

Start Exercise