Quiz 4 - Question 1

Assume you are training a small language model on the following 10 sentences that vary in length.

"I love NLP" (3 words)
"Natural language processing is fun" (5 words)
"Transformers are great" (3 words)
"We learn about n-grams and padding" (6 words)
"Probability plays a key role" (5 words)
"Batching helps with computational efficiency" (5 words)
"Padding sequences ensure uniform lengths" (5 words)
"Knowing your dataset is important" (5 words)
"Stochasticity introduces randomness" (3 words)
"Language models generate predictions" (4 words)

If you pad all sequences such that they are all as long as the longest sequence, how many pad tokens do you have to insert in total across all sequences?

Cet exercice fait partie du cours

Google DeepMind: Build Your Own Small Language Model

Afficher le cours

Exercice interactif pratique

Passez de la théorie à la pratique avec l’un de nos exercices interactifs

Commencer l’exercice

Cet exercice fait partie du cours

Google DeepMind: Build Your Own Small Language Model

IntermédiaireNiveau de compétence

5.0+

2 reviews

Commencer le cours gratuitement

In this module, you will explore the power of language models and their real-world applications. Starting with a manual method for modelling language, you will investigate the role that probabilities and randomness play in next word prediction. You will also consider the course learning objectives and how to most effectively study.

Exercise 1: The power of language models Exercise 2: Predict the next word Exercise 3: Learning objectives Exercise 4: How to get the most out of this course Exercise 5: The role of probabilities in language models Exercise 6: Lab: Create Your Own Probability Distribution Exercise 7: Reflect on your findings Exercise 8: Quiz 1 - Question 1 Exercise 9: Quiz 1 - Question 2

In this module, you will move beyond the manual method and explore how n-grams can be used to tokenize data. You will investigate how probabilities can be calculated to begin identifying language patterns. You will then build your own n-gram model using a small dataset and examine its limitations. Furthermore, you will consider the process researchers undertake when approaching real-world problems through the lens of Google DeepMind’s AlphaFold project. Finally, you will reflect on your own values and those of your community, as well as the role AI systems play in making decisions that involve ethical choices.

Exercise 1: N-grams Exercise 2: Lab: Experiment with N-grams Exercise 3: The limitations of n-grams Exercise 4: AlphaFold: The power of machine learning Exercise 5: Weighing values: Culture and ethics in the trolley problem Exercise 6: Applying a local ethical lens to the trolley problem Exercise 7: Quiz 2 - Question 1 Exercise 8: Quiz 2 - Question 2

In this module, you will experiment with more sophisticated transformer models and evaluate how they perform in comparison to n-gram models. You will take a deeper dive into the anatomy of language models and their core components. You will continue reflecting on the role that values play in guiding which technical problems you choose to solve. Specifically, you will consider the Ubuntu moral system and compare its characteristics with moral values popular in Europe and North America. Finally, you will design a values framework for guiding LLM development in your local community.

Exercise 1: Lab: Compare N-Gram Models and Transformer Language Models Exercise 2: Core aspects of Ubuntu Exercise 3: Develop a local values framework Exercise 4: Anatomy of a language model Exercise 5: What does it mean to train a model?Exercise 6: Quiz 3 - Question 1 Exercise 7: Quiz 3 - Question 2

In this module, you will contextualise the process of building language models within the machine learning development pipeline. You will preprocess your dataset and learn how to prepare a dataset to be used for training a transformer model. You will then train your own language model and evaluate its performance.

Exercise 1: Machine learning development pipeline Exercise 2: Lab: Prepare the Dataset for Training an SLM Exercise 3: Lab: Train Your Own Small Language Model (SLM)Exercise 4: Evaluating a model Exercise 5: Quiz 4 - Question 1

Exercice en cours

Exercise 6: Quiz 4 - Question 2

In this module, you will consider the specific benefits that transformer LLMs can bring about for different sectors in your local context. You will then explore what makes a good problem statement before developing your own problem statement for a challenge around language models that you have identified in your community.

Exercise 1: Anticipating benefits Exercise 2: Challenge: Develop your problem statement Exercise 3: Quiz 5 - Question 1 Exercise 4: Quiz 5 - Question 2

In this module, you will have the opportunity to consult additional resources and further reading to investigate the topics you have covered in more detail. Finally, you will consider your next steps and how you can build on what you have learned in the course.

Exercise 1: Summary Exercise 2: Looking forward Exercise 3: Additional resources and further reading Exercise 4: Feedback Exercise 5: Glossary