Get startedGet started for free

Managing data

1. Managing data

In this video, we'll learn about how to keep track of our data with the help of databases and other tools.

2. Databases: basic concepts

A database is a general, loose term for the storage of data. A database is typically managed with a database management system; this is software that allows the user to store, retrieve and access the data. There are several specialized types of databases. A document database stores unstructured data. Relational databases store structured data. These are called 'relational' because they usually store multiple databases that are related to each other. For example, one data table contains the sales data of a particular product. Through the customer id, another data table can be accessed to get the complete customer data. Another important distinction is a data warehouse versus a data lake. A data warehouse contains processed, organized data in preparation for further analysis. On the other hand, a data lake is used to store raw data that has not been prepared yet. Typically, designing and optimizing database systems is the responsibility of a data engineer. They help ensure that the necessary data is available and ready to be used.

3. Data storage in the cloud

A lot of data is stored in the cloud nowadays, which means the data is stored on remote servers and accessed over the Internet. A specialized third party typically provides these services. Storing data on the cloud instead of on-site servers is more flexible and cost-effective. Still, it can be problematic with sensitive data, as you would be dependent on the security of the third-party provider.

4. Automation through data pipelines

The purpose of pipelines is to move data from one database to another. This process can be automated using the ETL framework. ETL stands for: Extract, Transform, Load, referring to the different processing steps. Making use of pipelines ensures the availability of up-to-date and accurate data.

5. Getting data from databases

Now that we have all the data in our databases, it is time to get our data ready for analysis. Retrieving data from databases is also called 'querying'. The industry standard for querying is SQL, which stands for Structured Querying Language. Further analysis can be done with programming languages like R and Python.

6. Dashboards

Another way to leverage the data available in databases, is through dashboards. Databases are very technical, but dashboards offer a non-technical alternative to collecting, managing and sharing data between teams. A dashboard provides information at a glance, typically metrics such as Key Performance Indicators to follow up on business goals. A dashboard has access to the data by being linked to a database. Dashboards typically show data in a very visually appealing way. They can be used for multiple purposes, including doing basic analysis or communicating. Creating and managing dashboards is typically the responsibility of a data analyst.

7. Let's practice!

Enough theory, it is time to practice!