Storage Solution Options on Google Cloud

1. Storage Solution Options on Google Cloud

There are several key products on Google Cloud that are used by data engineers. One main product is Cloud storage. Unstructured data is usually well suited to be stored in Cloud Storage. Within Cloud Storage, objects are accessed by using HTTP requests, including ranged GETS to retrieve portions of the data. The only key is the object name. There is object metadata, but the object itself is treated as unstructured bytes. The scale of the system allows for serving large static content and accepting user-uploaded content including videos, photos, and files. Objects can be up to five terabytes each. Cloud Storage is built for availability, durability, scalability, and consistency. It's an ideal solution for hosting static websites and storing images, videos, objects, and blobs, and any unstructured data. Cloud Storage has four primary storage classes; standard storage, nearline storage, coldline storage, and archive storage. The classes are differentiated by the expected period of object access. You have a full range of cost effective storage services for structured data to choose from when developing with Google Cloud. No one size fits all, and your choice of storage and database solutions will depend on your application and workload. Cloud SQL is Google Cloud's managed relational database service. AlloyDB is a fully managed, high- performance PostgreSQL database service from Google Cloud. Spanner is Google Cloud's fully managed relational database service that offers both strong consistency and horizontal scalability. Firestore is a fast, fully managed, serverless, NoSQL document database built for automatic scaling, high performance, and ease of application development. BigQuery is a fully managed, serverless enterprise data warehouse for analytics. Bigtable is a high-performance NoSQL database service. Bigtable is built for fast key-value lookup and supports consistent sub-10 millisecond latency. The two key concepts in data engineering are that of the data lake and the data warehouse. A data lake is a vast repository for storing raw unprocessed data in various formats, including unstructured, semi-structured, and structured. It serves as a centralized storage solution for diverse data types, enabling flexible use cases like data science, applications, and business decision making. A data warehouse is a structured repository designed for storing pre-processed and aggregated data from multiple sources. Primarily used for long term business analysis, it enables efficient querying and reporting for informed decision making. Data warehouses often operate as standalone systems, independent of other data storage solutions. BigQuery is a fully managed, serverless enterprise data warehouse for analytics. BigQuery has built-in features like machine learning, geospatial analysis, and business intelligence. BigQuery can scan terabytes in seconds and petabytes in minutes. BigQuery is a great solution for online analytical processing, or OLAP, workloads for big data exploration and processing. BigQuery is also well-suited for reporting with business intelligence tools. BigQuery has several easy to use options for accessing data. The first is via the Google Cloud console's SQL editor. The second is via the bq command line tool which is part of the Google Cloud SDK. The last is via a robust REST API which supports calls in seven programming languages. BigQuery organizes data tables into units called datasets. These datasets are scoped to your Google Cloud project. When you reference a table from the command line in SQL queries or code, you refer to it by using the construct, project.dataset.table. Access control is through IAM and is at the dataset, table, view, or column level. In order to query data in a table or view, you need at least read permissions on the table or view.

2. Let's practice!

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.

Introduction to Data Engineering on Google Cloud

BeginnerSkill Level

4.8+

11 reviews

This section welcomes you to the Introduction to Data Engineering on Google Cloud course, and provides an overview of the course structure and goals.

Exercise 1: Course Introduction

In this final section, we review what was presented in this course and discuss the next steps to continue your cloud learning journey.

Exercise 1: Course Summary Exercise 2: Course Resources