Get startedGet started for free

Capstone Crawler

This exercise gives you a chance to show off what you've learned! In this exercise, you will write the parse function for a spider and then fill in a few blanks to finish off the spider. On the course directory page of DataCamp, each listed course has a title and a short course description. This spider will be used to scrape the course directory to extract the course titles and short course descriptions. You will not need to follow any links this time. Everything you need to know is:

  • The course titles are defined by the text within an h4 element whose class contains the string block__title (double underline).
  • The short course descriptions are defined by the text within a paragraph p element whose class contains the string block__description (double underline).

This exercise is part of the course

Web Scraping in Python

View Course

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# parse method
def parse(self, response):
  # Extracted course titles
  crs_titles = response.xpath(____).extract()
  # Extracted course descriptions
  crs_descrs = response.xpath(____).extract()
  # Fill in the dictionary: it is the spider output
  for crs_title, crs_descr in zip(crs_titles, crs_descrs):
    dc_dict[crs_title] = crs_descr
Edit and Run Code