Get startedGet started for free

Collaboration and Version Control

1. Collaboration and version control

In any real team, you need to share work and track changes. Let's cover notebook sharing, permissions, and Databricks Repos.

2. Sharing notebooks

Sharing a notebook in Databricks is straightforward. Open the permissions dialog and add a colleague or group. You get three permission levels. "Can Read" lets someone view the notebook without running it - good for reviews. "Can Run" lets them execute cells using their own cluster, which is ideal for analysts who need your pipeline but shouldn't modify it. "Can Edit" gives full write access. You can also share a direct URL, but the recipient still needs the appropriate workspace permissions.

3. Collaboration features

Databricks has built-in collaboration features. You can leave comments on specific cells, which is useful for asynchronous code review. And there's a version history that shows who changed what and when, so you can roll back to a previous version if something breaks. These features work, but they have limits. The version history is linear - there's no concept of branches, pull requests, or merging. For a small team working on a few notebooks, that's probably fine. For anything larger, you need Git.

4. Enter Databricks Repos

Databricks Repos brings real version control into the workspace. You connect your Databricks workspace to a Git provider - GitHub, GitLab, Bitbucket, or Azure DevOps - and clone repositories directly. From there, you can create branches, commit changes, push to remote, and open pull requests, all from the Repos UI. This means your notebooks and code live in Git alongside your other software projects. Feature branches, code review, automated testing - the whole CI/CD workflow becomes possible.

5. The CI/CD workflow

Here's what a typical workflow looks like. A developer creates a feature branch in Repos, writes and tests their notebook, then commits and pushes. A colleague reviews the pull request on GitHub, suggests changes, and approves. The branch gets merged to main, and a CI/CD pipeline automatically deploys the updated notebook to production. This is the same workflow software engineers have used for decades, and Repos makes it accessible to data teams without leaving Databricks.

6. Notebook versioning vs. Repos

It's worth being clear about the difference. Built-in notebook versioning gives you a linear history - useful for seeing what changed, but limited. Repos gives you the full power of Git: branches, pull requests, merge conflict resolution, and collaboration with people who don't even have Databricks access. The trade-off is setup - you need a Git provider and some initial configuration. For solo exploration, notebook versioning is fine. For team-based development and anything heading to production, Repos is the way to go.

7. Summary

Let's wrap up. Notebook sharing uses workspace permissions at three levels. Built-in version history works for tracking changes but doesn't support branching. Databricks Repos integrates Git directly into the workspace, enabling branches, pull requests, and CI/CD workflows. For any team building data pipelines that go to production, Repos is essential. That closes out our chapter on compute and notebooks - next, we'll move into governance and data sharing.

8. Let's practice!

Let's practice.

Create Your Free Account

or

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.