1. Introduction to BigQuery
Hello and welcome to Introduction to BigQuery. My name is Matt, and I will be your instructor. First, we will learn more about BigQuery, what makes it unique, and where it fits among other data warehouse solutions.
2. What is BigQuery?
BigQuery, just like the name implies, allows you to run big queries! This means that if you know SQL, you can use BigQuery to query terabytes and petabytes of data. BigQuery is an enterprise data warehouse that scales to work with data in the largest organizations in the world. BigQuery was launched in 2012. It was built using several internal tools at Google for storage, analytics, and computing capabilities. These are the same tools Google used for their search capabilities to index the entire internet.
3. What makes BigQuery unique?
BigQuery is an OLAP, or online analytical processing, data warehouse tool that allows us to perform queries on big data. It does this by separating the compute, or processing power, and the data storage. It is serverless, meaning that Google handles the management of processing resources.
4. Compute and storage
One of the most important concepts of BigQuery is understanding the separation of computing power and storage. In this image, we can see that the data on the left is stored separately from the compute clusters on the right. BigQuery can allocate the necessary amount of computing resources, meaning that, as users, we can let BigQuery do the hard work of resource management. As an analogy, imagine that you are a baker who owns a cookie company. If you choose to invest in a professional kitchen, you still pay for it even when you are not using it. On the other hand, if you rent space in a shared kitchen for only the times you need, you only pay for what you use.
5. Snowflake and BigQuery
It is also important to understand how BigQuery compares to other data warehouse services. Snowflake differs from BigQuery in that it is very popular with developers for BI applications with more dynamic querying, compared to BigQuery, which works with large data for analytical reports at specific times. Snowflake can also run on Google Cloud, AWS, or Azure, whereas BigQuery is Google Cloud only. Finally, Snowflake has sized tiers of compute resources while also being serverless, compared to BigQuery, which decides what scale of compute matches your query.
6. Redshift and BigQuery
Redshift is a data warehousing service unique to Amazon Web services. It is similar to BigQuery in that it has serverless tools, as well as dedicated compute clusters to support ad hoc queries. This means that Redshift can support use cases such as live or real-time dashboarding, whereas BigQuery focuses on analytical reporting such as once-an-hour or once-a-day reports.
7. Traditional SQL databases and BigQuery
Traditional SQL databases such as MySQL or PostgreSQL are built to handle transactions such as inserts, updates, and deletes. These are known as online transactional processing databases. The storage and compute are coupled together just like your computer and queries workloads are not distributed. BigQuery on the other hand is an online analytical processing data warehouse or OLAP. As you know the compute and storage are separate, and queries are distributed. While you can use traditional SQL databases for analytics, these terms will help you understand the differences between the two.
8. How is BigQuery used?
While there are endless ways that we can use BigQuery, it tends to work best with scheduled reports such as daily sales report, periodic complex queries such as quarterly reports or presentations, and ad-hoc discovery on raw or processed data such as marketing analytics.
9. Let's practice!
Now let's take a look at what these use cases look like in practice!