Get startedGet started for free

Git Filter Repo

1. Git Filter Repo

Hey! Sometimes we need to rewrite history—whether it's to remove sensitive data, clean up unnecessary files, or restructure the repository. Git Filter-Repo is a powerful tool that rewrites repository history efficiently and safely.

2. What is git filter-repo?

Git Filter-Repo is a command-line tool for rewriting Git repository history. It is a separate library, but Git recommends using it rather than filter-branch. It's quicker, more reliable, and easier to use than filter-branch. Its features allows us to remove files, replace sensitive information, rename directories, and more—all across the entire commit history. During development we might accidentally commit sensitive information like API keys or database passwords. Leaving this data in the repository history can pose a security risk. With `git filter-repo`, we can remove this sensitive information from every commit in the repository's history. It ensures that no traces of the data remain while keeping the rest of the repository intact.

3. Filter-Repo process

Assume a file called `secrets.txt` was accidentally committed to our flight data pipeline project. We can use filter-repo to remove all traces of it from our git repository and its history. To use the tool, first, we'll need to install the library using pip if we haven't already done so. Then, we can run `git filter-repo --path secrets.txt --invert-paths`. This tells Git to rewrite the history and exclude `secrets.txt` from every commit. The --path option specifies which paths to operate on, and the --invert-paths option tells Git to operate on all paths except the ones specified. After running this command, all traces of `secrets.txt` will be gone from our repository's history.

4. Filter-Repo result

The filter-repo operation affects all branches and tags, not just the current branch. Importantly, all commit hashes change, even for indirect modifications. This is because each Git commit depends on its parent commit's hash. The result is a clean repository with no traces of the file in any commit. We can verify this by searching for the file in our commit history — it’s completely gone! Using `git filter-repo` has several significant implications. We'll have to force push the changes to update remote repositories, since the history has been rewritten. Force pushing is not a clean operation. In order to avoid conflicts between old and new history records, all collaborators must clone a fresh copy of the repository after updating the remote repository. As a result, other developers can be hindered in their work. Additionally, git filter-repo removes the specified files, but doesn't update references to those files in other files, so we need to do this manually.

5. When to use filter-repo

We use `git filter-repo` when we need to rewrite history for tasks like removing sensitive data, cleaning up large repositories, or restructuring files and directories. However, we need to be cautious as it rewrites history permanently, so coordinate with your team before pushing changes to shared repositories.

6. Let's practice!

Let's rewrite history using git filter-repo.