Ideation phase

1. Ideation phase

Let's explore the initial ideation phase of LLM application development.

2. LLM lifecyle: Ideation phase

During the ideation phase, we dive into understanding the business problem through data sourcing and base model selection, ensuring the right scope and requirements for successful development and operational phases.

3. Data sourcing

Providing the latest data is essential when developing our LLM application. This data serves as the fuel that powers the reasoning capabilities of the model. Data sourcing involves identifying needs, finding sources, and ensuring accessibility of the data we want to use. Three questions will guide us.

4. Data sourcing

Our first question is: Is the data relevant? This involves identifying the right information from internal or external sources. Our second question is: Is the data available? Sometimes, we need to transform the data to make it ready. Additional databases might be needed for text data searchability. Evaluating costs, particularly for external data, is crucial as accessing it may incur charges. Finally, consider access limitations related to volume or frequency. The last question is: Does the data meet standards? These standards may concern quality and governance. If the data contains confidential or sensitive information, it could impact the choice of base model, since we might need to guarantee that it remains within the organization.

5. Selecting the base model

After identifying data sources, the next step is selecting the base model. Most organizations choose pre-trained models, which already have been trained on significant amount of text data. The first step is determining whether to use a proprietary or an open-source model.

6. Proprietary models (privately owned)

Proprietary models, such as the ones shown here, are privately owned, while open-source ones are publicly accessible. A crucial consideration is whether exposing data to a third-party is acceptable, since proprietary models cannot be hosted within the confines of the organization. Some advantages of proprietary options are ease of set-up and use, quality assurance, and guarantees on reliability, speed and availability, but requires us to expose our data and offer limited customization.

7. Open-source (publicly accessible)

Open-source options offer advantages such as in-house hosting, transparency, and full customizability. However, they offer limited support and do not always allow commercial use. Moreover, customizing these models requires dedicated AI engineers. These models can be downloaded from online model hubs.

8. Factors in model selection

After deciding between proprietary and open-source models, we need to narrow down our choice to reach our final selection. This means evaluating factors in four categories: performance, model characteristics, practical considerations, and less important secondary factors. For performance, we should consider the response quality, often better with the latest released models, and speed, crucial for real-time applications. For model characteristics, consider the data that was used to train the model, ranging from webpages to codebases, affecting the responses we might expect. The context window size refers to the number of words a model uses to predict a next word, influencing quality. Finally, fine-tunability allows developers to optionally adjust the model for better performance.

9. Factors in model selection

For practical considerations, consider the type of license associated with the model, especially relevant for open-source models with commercial restrictions. The cost and the environmental impact should also be considered. Finally, there are less important secondary factors like number of parameters and popularity. These are often indicators for quality, speed, cost, and power usage.

10. Let's practice!

With data and base models set, we move to development. But first, let's practice!

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.