Managing and Configuring Clusters

1. Managing and configuring clusters

Clusters aren't "set and forget." Understanding their lifecycle and cost controls will save your team real money.

2. Finding your clusters

The Compute page gets crowded in a large organization. The default "All Clusters" view shows everything in the workspace. Use filters to narrow things down. "My Clusters" shows what you created. "Shared with Me" shows clusters others gave you access to. You can also filter by state - just running clusters, or terminated ones you need to restart.

3. Cluster states

Clusters move through a predictable set of states. Creating or starting one puts it in "Pending" while virtual machines spin up. Once ready, it moves to "Running." Restart briefly enters "Restarting" - machines stay, software resets. Stopping moves through "Terminating" to "Terminated." The configuration is preserved, so you can restart later without reconfiguring.

4. Termination: what you keep and what you lose

Terminating a cluster is not the same as deleting it. Configuration, notebooks, and cloud storage data are safe. What you lose is anything in memory - cached DataFrames, temporary variables, and manually pip-installed packages. That catches people off guard. For recurring needs, use init scripts or cluster libraries so they reload automatically.

5. When to restart

Restarting is your first troubleshooting step for most cluster issues. Just installed a library and it's not being picked up? Restart. Cluster performance degraded after hours of use? Restart - it clears the memory and resets the Spark context. Updated an init script? You'll need a restart for it to take effect. Restarting is quick because it doesn't release the underlying machines. It just resets the software, which typically takes under a minute.

6. Autoscaling and auto-termination

Two settings that directly affect your bill. Autoscaling sets a minimum and maximum worker count - the cluster grows under load and shrinks when idle. Auto-termination is your safety net - if nobody uses the cluster for a set period, it shuts down. Both are easy to overlook during setup, but they're the difference between a reasonable bill and a month-end surprise.

7. Cluster policies

In an organization with dozens of users, you can't rely on everyone making sensible choices. That's where cluster policies come in. An admin defines a policy that sets guardrails - maximum number of workers, allowed machine types, mandatory auto-termination. When a user creates a cluster, they work within those bounds. They still choose their runtime and attach their notebooks, but they can't accidentally spin up a 50-node cluster with the most expensive GPU instances. Policies are about governance without friction.

8. Summary

Here's the short version. Use filters to cut through the noise on the Compute page. Understand what termination preserves and what it clears. Restart as your first troubleshooting step. Turn on autoscaling and auto-termination to keep costs in check. And if you're an admin, use policies to set guardrails for the whole team. Next up, we'll move from compute to development - Databricks notebooks.

9. Let's practice!

Let's put this into practice. You'll filter clusters, inspect their configuration, and work through a management scenario.

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.