1. Querying in the Data Intelligence Platform
Hello there! In this video, we will be going over how to query data and perform analyses using Databricks SQL.
2. Motivation
Up to this point, we have learned how to use Databricks SQL to process data through our lakehouse architecture. We have ingested raw data and then cleaned, joined, or filtered the data to the correct format based on our knowledge of the data.
3. Motivation
Now we are finally ready to start analyzing our data! In this phase of the lakehouse architecture, we will most likely be analyzing data that is created as part of the Gold layer, or BI ready, datasets. These are used to build out the different consumable data layers, such as data marts.
4. SQL query basics
Databricks SQL is built on the ANSI SQL standard, a universal and open-source SQL language designed for portability and easy translation between different SQL dialects. This means queries in Databricks are very similar to those in other database platforms.
Queries follow the standard SELECT FROM structure, support a variety of functions, and can be run against any dataset Databricks can access, such as Delta tables or data from a JDBC connection.
Here’s a simple example: we select specific fields from a source table and filter the results based on criteria. While this may look familiar, it's a great starting point for performing analyses in Databricks SQL—like selecting a subset of sales data based on certain filters.
5. Common functions
When writing queries for analysis, we often want to do more than just retrieve raw data. We want to enhance and modify the results using functions.
Databricks SQL offers a wide range of functions, including those from SQL, Python, and the Spark framework. These range from basic string and numeric functions like ROUND() or CONCAT(), to logical operations like CASE or IF(). This allows data analysts to embed specific business logic directly into their queries. Additionally, users can create custom functions, known as User Defined Functions (UDFs), and incorporate them into their analyses.
In this example, we've taken the query from the previous slide and enhanced it by applying several functions to refine and modify the results.
6. Visualizations
After crafting a query that returns the necessary data, data analysts often turn to visualizations to better communicate insights and tell a compelling story. As demonstrated, visualizations can be created directly within the SQL Editor in Databricks SQL. Databricks supports a wide range of common chart types, giving analysts full control over how they present and interpret their data.
7. Let's practice!
Now, let's go practice writing some queries in Databricks SQL!