What is Llama?

1. What is Llama?

Welcome to this course on working with Llama 3.

2. Meet the instructor

I'm Imtihan Ahmed, a machine learning engineer with six years of experience building AI at scale with LLMs such as Llama, and I'll be guiding you through this course.

3. What is Llama 3?

Imagine having an AI that can summarize reports in seconds

4. What is Llama 3?

analyze data,

5. What is Llama 3?

or even assist with code - all running privately on our own machines without sharing our data.

6. What is Llama 3?

This is Llama 3, an open-source large language model developed by Meta.

7. What is Llama 3?

It is designed to understand and generate human-like text,

8. What is Llama 3?

and has been trained on massive amounts of data: the equivalent of about two thousand times the whole of Wikipedia. The model has been released for anyone to download and use through various open-source libraries.

9. Why run Llama 3 locally

For example, Aitomatic, a company specializing in industrial automation, utilizes Llama models to assist process engineers by predicting equipment maintenance needs. Like other companies and practitioners using Llama locally, they benefit from:

10. Why run Llama 3 locally

Privacy and security, as data doesn't leave their system;

11. Why run Llama 3 locally

Cost efficiency, as they don't have to cover any API costs;

12. Why run Llama 3 locally

And finally, the possibility to modify the model independently.

13. Using Llama locally

Whether we have a cloud server, an industrial PC, or even just our laptop available with Python installed, we can use the llama cpp python library to run Llama locally. To install it, we run pip install llama cpp python.

14. Asking questions to Llama

When installed, the llama-cpp library is accessible as llama_cpp. To interact with a model, we initialize an instance of the Llama class, which initializes the LLM for text generation, allowing to send prompts and receive responses. A key parameter is the model path, where we pass the location of the saved model file. Llama models are typically in a format called GGUF, which is optimized for fast inference. If the model is not already available, it can be downloaded from sources like Meta's official releases or third-party repositories. Once initialized, we can pass the model a question. For example: "What are some ways to improve customer retention?"

15. Asking questions to Llama

In the background, the model processes the question, or prompt, by looking at patterns from its training, and predicting the most likely next words to form a response.

16. Unpacking the output

When we run a completion, Llama 3 returns a structured dictionary response, where the model's reply is stored alongside other information about the model's call.

17. Unpacking the output

To extract just the model's response, we access the "text" field inside the first element of "choices", a list that contains one or more response objects. As we can see after unpacking the output, the model suggested some key areas to improve customer retention.

18. Let's practice!

Let's ask Llama some questions!

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.