Data management concepts

1. Data management concepts

Organizations need a modern approach to enterprise data to manage the vast volumes that are produced. The list of options often includes databases, data warehouses, and data lakes. Let’s explore each of these options starting with databases. A database is an organized collection of data stored in tables and accessed electronically from a computer system. Let’s examine two types of databases: relational and non-relational. A relational database stores and provides access to data points that are related to one another. This means storing information in tables, rows, and columns that have a clearly defined schema that represents the structure or logical configuration of the database. A relational database can establish links—or relationships–between information by joining tables, and structured query language, or SQL, can be used to query and manipulate data. Relational databases are highly consistent, reliable, and best suited for dealing with large amounts of structured data. They’re designed for business data processing and storing the online transactional data needed to support the daily operations of a company. A non-relational database, sometimes known as a NoSQL database, is less structured in format and doesn’t use a tabular format of rows and columns like relational databases. Instead, non-relational databases follow a flexible data model, which makes them ideal for storing data that changes its organization frequently or for applications that handle diverse types of data. This includes when large quantities of complex and diverse data need to be organized, or when the data regularly evolves to meet new business requirements. Choosing the right database depends on the use case. Google Cloud relational database products include Cloud SQL and Spanner, while Bigtable is a non-relational database product. We’ll look at these products in more detail later. Let’s explore another data management concept, the data warehouse. Like a database, a data warehouse is a place to store data. However, while a database is designed to capture data for storage, retrieval, and use, a data warehouse is designed to analyze data. A data warehouse is an enterprise system used for the analysis and reporting of structured and semi-structured data from multiple sources. Think of the data warehouse as the central hub for all business data. Business data might include point-of-sale transactions, marketing automation, or even customer relationship management data. Suited for both ad hoc analysis and custom reporting, a data warehouse can help analyze sales and identify trends, because it can store both current and historical data in one place. This capability can provide a long-range view of data over time, which makes a data warehouse a primary component of business intelligence. BigQuery is Google Cloud's data warehouse offering. We’ll explore BigQuery in more detail later. Although data warehouses handle structured and semi-structured data, they’re not typically the answer for how to handle large amounts of available unstructured data, like images, videos, and documents. Unstructured data, which doesn't conform to a well-defined schema, is often disregarded in traditional analytics. A data lake is a repository designed to ingest, store, explore, process, and analyze any type or volume of raw data, regardless of the source, like operational systems, web sources, social media, or Internet of Things, or IoT. It can store different types of data in its original format; ignoring size limits, and without much pre-processing or adding structure. Having this unprocessed, raw data available for analysis prevents unintentionally contaminating the data or adding bias. It also means that the raw data can be enriched by merging it with other data at the same time. This differs from a data warehouse that contains structured data that has been cleaned and processed, ready for strategic analysis based on predefined business needs. Data lakes often consist of many different products, depending on the nature of the data that is ingested. For example, the best Google Cloud products for storing structured data are Cloud SQL, Spanner, or BigQuery. For semi-structured data, the options include Datastore and Bigtable. And for storing unstructured data, Cloud Storage is an option. Data warehouses and data lakes should be considered complementary instead of competing tools. Although both store data in some capacity, each is optimized for different uses. Traditional data warehouse users are business intelligence analysts who are closer to the business and focus on driving insights from data. These users traditionally use the data to answer questions. Data lake users, and also analysts, include data engineers and data scientists. They’re closer to the raw data with the tools and capabilities to explore, mine, and experiment with the data. These users find answers in the data, but they also find questions. As enterprises are increasingly focused on data-driven decision making, data warehouses and data lakes play a critical role in an organization’s digital transformation journey. Democratization of data lets users gain a deeper understanding of business situations because they have more context than ever before. Today, organizations need a 360-degree real-time view of their businesses to gain a competitive edge.

2. Let's practice!

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.