Managing Data Catalogs
1. Managing Data Catalogs
In this scenario, I will be assuming the role of a data analyst at Amazon. I just received a new set of data from one of our other analyst teams, and they want to start querying their data from Databricks. Starting out on the Databricks UI, I will navigate to the Catalog Explorer to see what data is available to us. Looking through the catalogs, I can confirm that this dataset does not already exist in our Unity Catalog implementation. There are a number of ways that I can create tables based on my data, both through the UI and programmatically. In this case, I have already uploaded the data to the Databricks FileStore, or DBFS, so I will use code to create a couple tables. I can use this command to take a look at the folder where my files are hosted. I see that I have a folder of Parquet files that represent my data table. Reading this data into a DataFrame, I can see that this data relates to product reviews on the online platform. In the next notebook cell, I can create a tables in this catalog based on the Parquet files that I have. Databricks is then able to read those files, understand the data structure, and create new Delta tables in Unity Catalog. To verify that my tables now are in Databricks, I will try to query them using a simple SQL query. And just like that, I now have my data in Delta tables and am ready to use Databricks to fuel my analytics. Let's practice creating some tables in our following exercises.2. Let's practice!
Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.