Advantages and limitations of column family databases

1. Advantages and limitations of column family databases

In this lesson, we will study the advantages and limitations of column family databases.

2. Advantages - flexibility

Column family databases are flexible. As we learned in the previous lesson, rows within the same column family can have a different number of columns, and we can add new columns to a row if we need them. This flexibility avoids filling the new columns with default values for the existing rows, as it happens with relational databases. Flexibility is a great advantage, but it mustn't be considered the only criterion when choosing column family databases, especially if we don't need to handle big data. If we are just considering flexibility, we should also evaluate key-value and document databases.

3. Advantages - speed

Another advantage of column family databases is the speed. The information of the related columns within the same column family is stored together on disk, writing and retrieving information faster than if the data was stored in different parts of a disk.

4. Advantages - scalability

Like other NoSQL databases we have studied, column family databases scale horizontally by sharding across multiple servers.

5. Advantages - large volumes of data

Column family databases are designed to handle large volumes of data. This is due to the speed and the horizontal scalability we mentioned earlier, as well as an efficient data compression.

6. Limitations

Column family databases also have some limitations. Although this kind of database supports atomic reads and writes to a single row, it doesn't support multirow transactions. It means that if we need more than one operation within the same transaction, we won't be able to perform it. For example, if we want to update in the same transaction a row that belongs to one column family and then another row that belongs to another column family. Another limitation is that column family databases don't support joins or subqueries, as data within column families is intended to be related and retrieved together. Finally, we should define the queries quite well before modeling the column families. If the queries change, we may also need to modify the schemas of the column families we already designed, and it can be costly.

7. Let's practice!

Great! Let's complete some exercises before studying the scenarios where column family databases are suitable.