Data Sources Versus Data Sinks
1. Data Sources Versus Data Sinks
The ingest stage of a data pipeline is the point where data becomes a data source and is available for usage downstream. Think of a data source as the starting point of your data journey. It is raw, unprocessed data waiting to be transformed into valuable insights. Any system, application, or platform that creates, stores, or shares data can be considered a data source. Two examples of Google Cloud products used in the ingest phase are Cloud storage, a data lake holding various types of data sources, and Pub/Sub, an asynchronous messaging system delivering data from external systems. The transform stage of a data pipeline represents action taken on a data source to adjust, modify, join, or customize a data source so that it matches a specific downstream data or reporting requirement. There are three main transformation patterns: extract and load, extract, load, and transform, and extract, transform, and load. You explore each of these patterns in their own modules later in the course. The store stage of a data pipeline represents the last step when we deposit data in its final form. A data sync is the final stop in the data journey. It's where processed and transformed data is stored for future use, analysis, and decision-making. Think of it as the reservoir at the end of the river, where valuable information is collected and readily available. Two examples of Google Cloud products used in the store phase are BigQuery, a serverless data warehouse, and Bigtable, a highly scalable no SQL database.2. Let's practice!
Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.