What is a column family database?

1. What is a column family database?

Congratulations on finishing chapter two! In the previous chapter, we learned the main concepts of document databases. Now, let's discover column family databases!

2. Column family databases - overview

Column family databases derive from the Google BigTable data storage system. They store the information in column families, that group related data frequently accessed together. Column family databases are also called wide column databases. They are great when dealing with large volumes of data.

3. Column family databases - structure

Let's analyze the structure of a column family. A column family

4. Column family databases - structure

can have

5. Column family databases - structure

multiple

6. Column family databases - structure

rows. A column family is like a table in a relational database.

7. Column family databases - structure

Each row has a unique row key identifier. Row keys are like primary keys in a relational database.

8. Column family databases - structure

Each row contains columns but can have a different number of columns than other rows. Columns can be added to the rows when they are needed.

9. Column family databases - structure

The parts of the columns are the name, the value, and the timestamp.

10. Column family databases - structure

Depending on the column family database, we may specify the data types of the values, like integers, strings, lists, etc.

11. Column family databases - structure

Timestamps store the date and the time when the data was inserted. If we update the value of a column, a new timestamp will be inserted along with the column name and value. This versioning allows us to store multiple values of a column but knowing which one is the latest value.

12. Column family databases - example

For instance, if we wanted to store our application users' information, we could do something like this. As we can see, each row contains a different number of columns. The first row doesn't have the date_of_birth column; the second row doesn't have the address or the date_of_birth columns, and the last row doesn't have the address column. If we want any of these rows to have a new column, we can add it when we need it.

13. Column family databases - designing

Regarding the design of column-family databases, we have to think about the queries we will use in our application and include that information into column families. It is important to know that column-family databases don't support joins, so we will need to add all the columns we need for a query.

14. Popular column family databases

Here are some popular column family databases.

15. Let's practice!

Great! Let's do some exercises!