Get startedGet started for free

Data Storage: Storage types

1. Data Storage: Storage types

Now, we'll review storage. A key component of a data architecture.

2. Blob storage

Let's start with blob storage. This is probably the most general type of storage we could use. Blob refers to binary large objects, which means we will be storing the data here as binary objects without much information about their content as we don't know much about its structure. That means blob storage will support all types of data. We could store tabular data using CSVs, Parquet, or Avro, or store pdfs, audio files, or whatever we need to store. Another great characteristic of blob storage is its scalability. This will actually depend on the provider, but most popular cloud providers offer virtually unlimited blob storage. Nonetheless, they normally limit the size of a single object. For instance, Amazon S3 has been keeping that limit up to 5TB per object. Still, this is easily overcome by partitioning such objects, so don't worry! Finally, a really attractive characteristic of blob storage is its cost. They're cheap! Surely cheaper than most data warehouses or databases.

3. Blob storage use cases

About use cases, blob storages are a really good choice for unstructured data, and also for backing up or archiving any type of data. Additionally, it is a good choice to be the first place you store your data when it arrives on your data platform. At the same time, you could use them to deliver content to end-users, like static website content or files.

4. SQL

Now we'll review storage solutions that actually require some sort of structure around the data, and in exchange it provides query capabilities to us. First, we have SQL-based storage. Here, we'll find relational database management systems or RDBMS like MySQL or PostgreSQL, which will be better suited for transactional applications with strong consistency and integrity. However, data warehouses may also fall into this category, obviously relaxing some conditions but still providing a SQL-based query engine. Thus, if we're looking to store structured data and query it in complex manners, we could consider using SQL-based storage. Some common offerings out there are the fully managed options by the top three cloud providers.

5. NoSQL

On the NoSQL, or "not only SQL", side, we'll find solutions that are planned to support huge traffic loads and have really good response times. However, it's important to know that we'll pay for that by not being strongly but eventually consistent. NoSQL stores are non-tabular databases, providing flexibility in data types and structure, meaning we could store semi-structured data with really diverse formats like documents, key-value pairs, and graphs, among others. Another interesting thing about SQL and NoSQL-based storage is the query capability. For blob storage, we could only get a file by its identifier, not much about its actual content. That's not the case for these storages, we could actually get insights from the data due to its structured nature. So, they're pretty common as well in later layers within a data architecture that has end-users consumption.

6. Data warehouses and data lakes

Even though data warehouses and data lakes are more complex systems, with current offerings like BigQuery, Redshift, or blob storages it is really common to consider them when deciding where and how to store our data. Most companies decide to get data directly from their operational databases into the data warehouse and have logical separation within it to be able to process the data to be better suited for analytical purposes. Actually, this is also known as ELT, which is a really common practice nowadays.

7. Let's practice!

Now we know some of the main possibilities for storing our data, so, let's review our understanding!

Create Your Free Account

or

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.