Querying runs

1. Querying runs

MLflow Tracking provides a way to log metrics, parameters, and artifacts to a centralized location during the model engineering and model evaluation phases of the ML lifecycle.

2. Model data

Throughout the ML lifecycle, we have likely built and experimented with many different models. We must now decide which model to use in our ML application by comparing metrics and other data.

3. Runs data

The MLflow Tracking UI offers a view into runs that belong to the same experiment but does not offer the ability to compare or calculate easily. Wouldn't it be great if we could query this run information for our own investigation? Luckily, MLflow offers a way of gathering this run information.

4. Searching runs

This is done through the search_runs function from the MLflow module. The search_runs function offers programmatic access to runs data and is used to query runs and return the data to an output for further data analysis. With search_runs, users can select a tool of choice for data analysis such as the widely used pandas library. Pandas is, in fact, the default output of the function.

5. Output format

Before getting started using search_runs, it is important to understand what data is available to query and what is returned. The following example is a pandas output from search_runs from an experiment. MLflow places each metric and parameter into a separate column as well as other data such as the run_id, status, start and end times, and tags. Each metric column is prefixed with metrics-dot-metric and parameters are prefixed with params-dot-parameter.

6. Filtering run searches

The search_runs function is flexible and can take several different arguments to retrieve data to fit our needs. Some of these arguments include: max_results, which only returns the specified amount of runs. order_by, which is used to sort columns such as metrics in ascending or descending order. filter_string is likely the most powerful argument which allows for querying runs based on a query string. Experiment_names is used to return data from only specified experiments. More than one experiment can be specified.

7. Tracking UI

The following experiment has 4 runs and contains both metrics and parameters that can be queried. Let's use the search_runs function to query the runs from the Default experiment so that we can get the run data into a format in which we can begin our analysis.

8. Search runs example

Let's say we want to search runs from the Insurance Experiment and query for the f1_score metrics that are greater than zero-point-six. We also want to order the results by precision_score in descending order. Begin by importing the mlflow module. Let's store our filter string as a variable to make it easier to pass in as an argument. Now let's call the search_runs function with filter_string and order_by arguments and include the experiment name "Insurance Experiment".

9. Example output

Our query results return two runs having f1_score greater than zero-point-six.

10. Let's practice!

Now that we have a better understanding of the search_runs function, let's practice by querying runs from our Unicorn experiments.

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.

Introduction to MLflow

AdvancedSkill Level

4.8+

476 reviews