Parsing PDF files
You now need to work on another small project you have been delaying. Your company gave you some PDF files of signed contracts. The goal of the project is to create a database with the information you parse from them. Three of these columns should correspond to the day, month, and year when the contract was signed.
The dates appear as Signed on 05/24/2016
(05
indicating the month, 24
the day). You decide to use capturing groups to extract this information. Also, you would like to retrieve that information so you can store it separately in different variables.
You decide to do a proof of concept.
The variable contract
containing the text of one contract and the re
module are already loaded in your session. You can use print()
to view the data in the IPython Shell.
This exercise is part of the course
Regular Expressions in Python
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Write regex and scan contract to capture the dates described
regex_dates = r"____\s____\s(____)/(____)/(____)"
dates = re.____(____, ____)