1. Smart analytics
Hello! We have learned about how we can store different types of data on GCP. In this video we will learn how to use smart analytics tools that GCP offers.
2. Databases
Before we delve into the tooling, let’s step back and consider databases: the standard method of storing structured data. They are like traditional filing cabinets in a bank. Data is stored in row-column tables, akin to neatly arranged papers in folders, which allows easy access and editing, perfect for day-to-day operations. For this GCP has Cloud SQL.
3. The global bank
Now imagine this bank has a hundred branches, each with several of these filing cabinets or databases. Several of the bank branches might access this database at the same time. We need a single database that scales up easily, and can handle multiple read and write operations simultaneously.
4. The infinite database
GCP’s Cloud Spanner is a possible solution. It is a fully managed relational database with unlimited scale, and is provisioned to handle multiple read/write operations simultaneously. In essence, it combines the benefits of traditional relational databases with the scalability of unstructured data.
5. Analytic needs
Now, let's say we want to calculate the average size of withdrawals from the bank on a daily basis. This requires repeated operations over a very large-scale database. For this, we need a data warehouse. Much like a real warehouse, data is collected, sorted, and collated in a data warehouse. The main objective here is to get the data ready for analysis.
6. Analytics in the warehouse
Imagine each branch of the bank sending daily updates to a central hub, the data warehouse. Whenever the average size of withdrawals is required, for example, the data warehouse can be queried. Not only does it provide a fast and consistent source of analytics for the whole bank, it also avoids duplication of work. A data warehouse is designed for structured data, optimized for reading and querying, but not for writing or ingesting data.
7. Analytics with GCP
BigQuery is GCP’s fully managed, serverless data warehouse which performs complex analytics queries across large datasets. It can scale up and down automatically with heavier workloads.
GCP adds another layer of functionality to BigQuery using Looker: a business intelligence platform that provides data visualization and interactive, real-time analytics. Its integration with BigQuery enables users to create custom reports and dashboards that can provide a deeper understanding of the data.
8. Data lakes
Data lakes are another important type of data storage. They are like safe deposit boxes that can store any type of valuable, ranging from legal documents to jewelry. A data lake is a centralized data store for structured, semi-structured, and unstructured data.
9. Data lakes
Data lakes can store raw, unstructured data at scale, without the need to categorize or organize it beforehand. This means they can act as central repositories for data from different sources. This also makes data lakes highly flexible and capable of handling vast volumes of disparate data types. They collect petabytes of data from various sources such as social media feeds, mobile devices, and customer transactions. The goal is to perform big data analytics and machine learning to gain insights that aren't feasible with more traditional structured data storage.
GCP offers a cloud-based data lake service that is a scalable and cost-effective solution to get data from diverse sources onto a unified platform. BigLake integrates BigQuery with data lakes for large-scale analytics.
10. Let's practice!
Now let’s do some exercises on how analytics are handled smartly!