Capstone Crawler
This exercise gives you a chance to show off what you've learned! In this exercise, you will write the parse function for a spider and then fill in a few blanks to finish off the spider. On the course directory page of DataCamp, each listed course has a title and a short course description. This spider will be used to scrape the course directory to extract the course titles and short course descriptions. You will not need to follow any links this time. Everything you need to know is:
- The course titles are defined by the text within an
h4element whose class contains the stringblock__title(double underline). - The short course descriptions are defined by the text within a paragraph
pelement whose class contains the stringblock__description(double underline).
Bu egzersiz
Web Scraping in Python
kursunun bir parçasıdırUygulamalı interaktif egzersiz
Bu örnek kodu tamamlayarak bu egzersizi bitirin.
# parse method
def parse(self, response):
# Extracted course titles
crs_titles = response.xpath(____).extract()
# Extracted course descriptions
crs_descrs = response.xpath(____).extract()
# Fill in the dictionary: it is the spider output
for crs_title, crs_descr in zip(crs_titles, crs_descrs):
dc_dict[crs_title] = crs_descr