Get startedGet started for free

Datastream

1. Datastream

Datastream enables continuous replication of your on-premises or multi-cloud relational databases such as Oracle, MySQL, PostgresSQ,L or SQL Server into Google Cloud. Datastream offers change data capture options for historical backfill or allows you to just propagate new changes with data landing in Cloud Storage or BigqQuery for analytics. You have flexibility in connectivity options and can selectively replicate data at the schema, table, or column level. Datastream enables real-time data replication from source systems for various use cases. It supports direct replication into BigQuery for analytics, allows custom data processing in Dataflow before loading into BigQuery, and facilitates event-driven architectures. Additionally, Datastream can be used with Dataflow templates for seamless database replication and migration tasks, making it a versatile tool for integrating data into Google Cloud. Datastream taps into the source database's write-ahead log (WAL) to capture and process changes for propagation downstream. Datastream supports reading the logging mechanisms for the specific source database such as LogMiner for Oracle, binary log for MySQL, PostgreSQL's logical decoding, and transaction logs from SQL Server. These change events such as inserts, updates, and deletes are then processed by Datastream and transformed into structured formats like Avro or JSON, ready for storage in Google Cloud, typically in BigQuery tables, enabling near real-time data replication for analytics and other use cases. Datastream event messages contain two main sections: generic metadata and payload. Metadata provides context about the data, like source table, timestamps, and related information. Payload contains the actual data changes in a key-value format, reflecting column names and their corresponding values. This structure allows for efficient and organized data replication and tracking of changes. Datastream event messages also include source-specific metadata in addition to generic metadata and payload. This metadata provides context about the data's origin within the source system, including details like the database name, schema, table, change type (such as INSERT), and other system-specific identifiers. This additional information helps track data lineage and understand the context of changes replicated from the source database. Datastream simplifies data replication by using unified data types to map between different source and destination databases. This means that regardless of whether your source data is in Oracle as number, MySQL as decimal, PostgreSQL, as numeric, or SQL Server as decimal, Datastream will consistently represent it as decimal during replication. When this data lands in Google Cloud, it can be further transformed into format-specific data types in different file types or destinations, such as Avro as decimal, JSON as number, or stored natively in BigQuery tables as numeric. This ensures data type consistency and compatibility across different database systems, streamlining the data replication process. In summary, Google Cloud offers several data migration and replication options. The 'gcloud storage' command is suitable for smaller online transfers. Storage Transfer Service handles larger online transfers efficiently. Transfer Appliance is ideal for massive offline data migrations, and Datastream provides continuous online replication of structured data, supporting both batch and streaming velocities. Choose the option that best fits your data size, transfer type, and data availability requirements.

2. Let's practice!

Create Your Free Account

or

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.