Getting started with Databricks SQL

1. Getting started with Databricks SQL

Hello, and welcome back! In this video, we will go over the basics of getting started using Databricks SQL.

2. SQL Compute vs. General Compute

To start things off, we must create compute power to run our SQL processes. In Databricks SQL, we do not use the same clusters that we use for data engineering or data science workloads. Instead, we will want to create an SQL Warehouse. These warehouses come with the optimizations needed to run efficient, performant, and scalable SQL queries in the lakehouse.

3. SQL Warehouse

You will notice that the configuration of these warehouses is much more straightforward and has far fewer options than creating compute clusters. This is by design, as data analysts want to focus on their work and not create a complicated cluster. Let us take a few seconds to go over what options are available to you.

4. SQL Warehouse

The main configuration users will focus on is the general size of the SQL warehouse. With SQL warehouses, instead of picking the specific instance types that you run queries on, you instead decide on a T-shirt size indicating if you need more or less compute. Apart from that, SQL warehouses will scale and terminate automatically based on your input criteria and the query load on that warehouse. Note that scaling for SQL warehouses is horizontal, meaning you add on another warehouse of the same size rather than scaling up or down to a different size.

5. SQL Warehouse

The last configuration you must consider is the type of SQL warehouse you will create. Databricks has two main kinds of SQL warehouses.

6. SQL Warehouse Types

The Classic and Pro warehouse types will follow the traditional compute architecture, where the compute nodes are created in the customer's cloud environment. There are different levels of feature support between these two, but all come with Photon for SQL query optimization. Databricks SQL also supports a serverless option. A serverless compute cluster is one that is created and maintained in the Databricks cloud environment instead of the customer's cloud environment. Choosing this mode gives the user the fastest warehouse start-up time and access to the latest features in Databricks SQL, and also removes any overhead to maintain the warehouse for the customer.

7. SQL Editor

Once a SQL warehouse is created, you are ready to start writing your queries! The SQL Editor will be your main page for SQL queries and should have a familiar look and feel. On the left-hand side, you will see a list of all the catalogs, schema, and tables that you have access to. At the top, there is a box for writing your SQL queries. At the bottom, you will see the results of your query and any visuals you have created based on the results. Note that these features are unique to the SQL Editor and would not be available in the notebook interface.

8. Common SQL Commands

While ANSI SQL and Databricks SQL have many different capabilities, we can write a few common queries outside of our standard SELECT * queries. COPY INTO is how we take raw data from the data lake and put it into a data table. This is the "E" part of ETL. Here is some example code where we are copying some files into my_table and merging them into the existing schema. Users will also use the CREATE <entity> AS pattern to create some kind of queryable entity, such as a table or view. In traditional SQL processes, you might have heard of the CTAS pattern. In this example code, we are creating a Delta table from our raw data and filtering it in some way.

9. Let's practice!

Now that we understand the basics of Databricks SQL let's start writing some queries!

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.