Basic operations with Databricks Python SDK
1. Basic operations with Databricks Python SDK
Welcome back! Now that we've set up and authenticated a `WorkspaceClient`, let's walk through some examples of interacting with our Databricks workspace.2. Databricks Clusters API
A Databricks cluster is a set of computation resources and configurations typically used to run data-intensive workloads that can be shared by multiple users for collaborative analysis. Some example use-cases for clusters are production ETL pipelines, streaming analytics, and ad-hoc analytics. We will be working with All-Purpose clusters and job clusters. The Databricks Clusters API allows us to interact with clusters to perform operations such as creating, starting, listing, and deleting all-purpose clusters.3. Listing clusters
Assuming a cluster was previously created in our Databricks workspace - either through the UI or using the SDK, let's walk through an example of listing the clusters created on a Databricks workspace. First, we authenticate and connect to the workspace client so we can communicate with our Databricks workspace. Next, we loop through all clusters returned by the Databricks SDK clusters API `list()` method and print the respective `cluster_id`. Each cluster can be identified by its unique `cluster_id` attribute. We can see from the output that there is one cluster in our workspace, beginning with 0113.4. Databricks Jobs API
What is a Databricks job? A Databricks job defines code that can run on a Databricks cluster in our workspace, optionally at scheduled intervals. The task orchestration, cluster management, monitoring, and error reporting for all of our jobs is managed by Databricks, allowing us to focus on the code we want to run. We can configure our job to be a single task or a complex, multi-task workflow that can optionally be scheduled to run at specific times. With the Databricks Jobs API, we can do anything we'd normally do in the UI: create, modify, run, or delete jobs programmatically.5. Listing jobs
We can use the `jobs` api `.list()` method to list the jobs_ids for all of the jobs in our Databricks workspace. Similar to the Clusters example, the first thing we need to do is instantiate the `WorkspaceClient` object so that we can interact with our workspace. The main difference is that instead of using the Clusters API, here we use the Jobs API list method.6. Databricks Jobs Dashboard
The resources for a workspace can be visualized online at `https://<workspace-deployment-name>.cloud.databricks.com`. We can visualize the created jobs Databricks UI by navigating to the jobs tab inside of Workflows.7. Databricks Notebooks
In this course, we run Python code from a Databricks Notebook inside a Databricks job. We won't be creating notebooks in this course, but we will assume there are existing Databricks notebooks created in our workspace that perform various data analytics operations. Here is an example of a simple Databricks Notebook that prints "Hello World!"8. Let's try it out!
Check out the Databricks SDK Python documentation to explore other functionality that the SDK provides.Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.