1. Data Intelligence Platform - Analytics
Welcome! In this video, we will go over a high-level overview of doing your analytics in the Databricks environment.
2. Why do organizations care about analytics?
It probably should go without saying, but organizations care very deeply about data analytics. Every leader in every industry could benefit from data analytics, and has most likely invested some time to a data strategy.
Therefore, organizations need to make sure that they setup an environment where people can perform the most valuable, scalable, and secure analyses possible.
3. Supported Languages
As a data persona, you have your choice of language in Databricks, as the platform supports the four most common in the world of data analytics.
Scala is one such supported language. Based on Java, this is a common option generally used in the world of data engineering.
Python is another supported language, and is by far one of the most popular. Python is very general, and can be used for any data workload.
SQL is also supported, and is very common with the analyst persona. While SQL is generally thought of as a language for Business Intelligence use cases, it can also be useful for data engineering.
R is the fourth language that Databricks supports. Most common in the world of data science and statistics, R is generally used in data science applications.
4. Databricks Notebooks
When performing any data process within the Databricks UI, you will likely be writing your code in one of two areas.
The first, and most universal, location would be within Databricks notebooks. These are based off the open-source Jupyter notebook framework, which is very popular and common for many data personas. Databricks notebooks have several optimizations and enhancements over the open source variety.
Some optimizations are shown in the screenshot here. As you can see in the screenshot here, there is a built-in and enhanced place to visualize data where you can also interact with the visualization. You can also collaborate in real-time within the same notebook, even leaving comments for other users to see.
5. SQL Editor
If you are a SQL user, then you will likely prefer to write your queries in the SQL Editor. This view on Databricks is optimized and designed to be familiar for the analyst who comes from a traditional data warehouse environment. You will have an area to write your code, a place to see the results of your query, and a pane on the left to explore the datasets available to you.
6. Databricks Connect
While there are many benefits to working directly in the Databricks UI, many data personas have learned a specific workflow with their favorite tools. If you are someone who likes to use a specific IDE, then you are in luck! Databricks Connect is a way to connect your favorite IDE to your Databricks environment. This way, you can send your processes directly to a Databricks cluster, while still being able to to leverage some of the more sophisticated components of an IDE.
7. Let's practice!
Now, let us review some of the main concepts regarding analysis in Databricks.