Capstone Crawler
This exercise gives you a chance to show off what you've learned! In this exercise, you will write the parse function for a spider and then fill in a few blanks to finish off the spider. On the course directory page of DataCamp, each listed course has a title and a short course description. This spider will be used to scrape the course directory to extract the course titles and short course descriptions. You will not need to follow any links this time. Everything you need to know is:
- The course titles are defined by the text within an
h4
element whose class contains the stringblock__title
(double underline). - The short course descriptions are defined by the text within a paragraph
p
element whose class contains the stringblock__description
(double underline).
This exercise is part of the course
Web Scraping in Python
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# parse method
def parse(self, response):
# Extracted course titles
crs_titles = response.xpath(____).extract()
# Extracted course descriptions
crs_descrs = response.xpath(____).extract()
# Fill in the dictionary: it is the spider output
for crs_title, crs_descr in zip(crs_titles, crs_descrs):
dc_dict[crs_title] = crs_descr