Exercise

Capstone Crawler

This exercise gives you a chance to show off what you've learned! In this exercise, you will write the parse function for a spider and then fill in a few blanks to finish off the spider. On the course directory page of DataCamp, each listed course has a title and a short course description. This spider will be used to scrape the course directory to extract the course titles and short course descriptions. You will not need to follow any links this time. Everything you need to know is:

  • The course titles are defined by the text within an h4 element whose class contains the string block__title (double underline).
  • The short course descriptions are defined by the text within a paragraph p element whose class contains the string block__description (double underline).

Instructions 1/2

undefined XP
    1
    2
  • Assign to the variable crs_titles the extracted list of course titles on the DataCamp course directory page. You should use the contains call within your XPath, and your XPath string should point to the text of the selected objects.
  • Assign to the variable crs_descrs the extracted list of short course descriptions. You should use the contains call within your XPath. You should use the contains call within your XPath, and your XPath string should point to the text of the selected objects.

(Since we want a list of extracted data, we will use the extract() call (rather than extract_first()). )