1. Deployment
Let's start the operational phase.
2. LLM lifecycle: Deployment
Deployment is about making the application available to a wider audience.
3. Moving to deployment
As we prepare to deploy, it's worth noting that there's no one-size-fits-all approach. The key factor is the infrastructure we're planning to use.
An application may include a chain/agent logic, a vector database, the LLM itself, and more. Each component needs to be deployed and work together. We recommend a step-by-step list of deployment considerations.
4. Step 1: Choice of hosting
First, we decide where to host the application components. We can choose between private or public cloud services, or on-premise hosting, depending on what our organization requires. Many cloud providers offer easy-to-use solutions for hosting and deploying LLMs.
5. Step 2: API design
Next, we plan which parts of our system need to be accessible for communication. An API, or application programming interface, acts like a messenger, letting different software talk to each other. It's a set of rules that developers follow to request and share information between systems. APIs have specific locations, called endpoints, where they send and receive data.
Designing APIs affects scalability, cost, and infrastructure needs. Each component, like the LLM or vector database, could have its own endpoint for better scalability, though this increases cost and infrastructure requirements.
Endpoints can be private or public, so security is crucial, often controlled with API keys.
6. Step 3: How to run
The last step is deciding how each component will run. Options include containers, serverless functions, or cloud managed services.
Each choice has its advantages and disadvantages, like costs, scalability, efficiency, and flexibility.
Containers are a popular choice for their flexibility, adaptability, and scalability. They are lightweight, standalone software packages containing everything needed to run an application. Plus, there are specialized containers designed for running LLMs.
7. CI/CD
How do we move from source code to deployment? This is where continuous integration and continuous deployment, or CI/CD, come in. CI/CD automates integration, testing, and deployment, which we will cover on a high level.
The first step of a CI pipeline is to retrieve our source code. From there a container image is created containing the code. We then test to ensure software components work together. Finally we register the container in a registry. All of this can be triggered automatically when new code changes appear.
In the CD phase, we retrieve the container from the registry and perform deployment tests to ensure everything works as expected. When deploying, we first test the application in a staging environment, then move to production once approved.
CI/CD enables seamless delivery and forms the foundation of modern LLMOps practices.
8. Scaling
Now that the application is nearly up and running, a final consideration is about scaling. Self-hosted LLMs might need specialized GPU hardware.
When running the application we may find that it cannot handle the load. There are roughly two scaling strategies available. Horizontal scaling means adding more machines, like adding more cars to a road. Vertical scaling means boosting one machine's power, like making a car's engine better. Horizontal scaling suits big traffic, while vertical scaling is for increasing reliability and speed.
9. Let's practice!
LLM application deployment isn't a one-size-fits-all approach, but with the basics in mind, we're ready to practice!