Get startedGet started for free

Getting started with the Databricks SDK

1. Getting started with the Databricks SDK

Hello, I'm Avi Steinberg.

2. Why learn Databricks?

I am a software engineer with experience working for multiple technology startups and Fortune-500 companies. I've used Databricks in a professional capacity for multiple years and in this course I will teach you how to use the Databricks Software Development Kit (SDK) for Python. This is a collection of libraries and APIs that let us interact with a Databricks workspace. Why learn Databricks? Because it is one of the leading machine learning, AI and data engineering cloud platforms on the market. It is used by over 60% of Fortune 500 companies.

3. What is a Databricks Workspace?

So what is a Databricks Workspace? It is a collection of Databricks cloud resources that functions as an environment to access Databricks assets. Some examples of assets we may have in our workspace are clusters, notebooks and jobs.

4. Install Databricks SDK

Our entry point into our Databricks workspace is the WorkspaceClient, which we import from the `databricks.sdk` module. We instantiate an object of the WorkspaceClient class and use this to interact with our Databricks workspace. But how does the workspace client know which Databricks workspace to interact with? The answer is Authentication.

5. Authentication environment variables

An environment variable is a variable stored on a computer system that consists of a Key/Value pair that is typically used to configure application behavior based on external settings. The `WorkspaceClient` requires three environment variables to authenticate to a Databricks workspace: `DATABRICKS_CLIENT_SECRET`, `DATABRICKS_CLIENT_ID` and `DATABRICKS_HOST`. The `DATABRICKS_HOST` is the same as the url we navigate to access our Databricks workspace in our browser. It has the form `<workspace_id>.cloud.databricks.com`. To get the `DATABRICKS_CLIENT_ID` and `DATABRICKS_CLIENT_SECRET` environment variables we can authenticate using a service principal.

6. Authenticate using a Service Principal

A "service principal" is a security identity within a cloud platform that represents an application or service, allowing it to authenticate and access resources without requiring a human user login. They are useful when you want to create identities with different levels of permissions for different resources. To authenticate a Databricks workspace using a service principal, we log into our workspace, create a service principal, assign permissions to it and then create an OAuth Secret that will consist of a `ClientId` and `ClientSecret`. These values can be used to set the `DATABRICKS_CLIENT_ID` and `DATABRICKS_CLIENT_SECRET` environment variables.

7. Default authentication

We can use the `os` library `.environ()` function to export the desired values to these environment variables. With default authentication, we instantiate the `WorkspaceClient` without passing in any parameters. It automatically reads the values for the three environment variables: `DATABRICKS_CLIENT_SECRET`, `DATABRICKS_CLIENT_ID` and `DATABRICKS_HOST` and authenticates to the corresponding Databricks workspace.

8. Interacting with our Databricks workspace

Now that we learned how to authenticate the `WorkspaceClient`, we can interact with our Databricks workspace. Databricks spark clusters can be used to run data-intensive workloads on customized infrastructure. We can create, list and delete clusters in our workspace using the `WorkspaceClient`. This is just a taste of what the `WorkspaceClient` is capable of; it can be used to programatically do anything that we can manually do in the Databricks web app and more.

9. Let's practice!

Let's practice authenticating the Databricks `WorkspaceClient`!

Create Your Free Account

or

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.