The Finale

1. Stop Scratching and Start Scraping!

Congratulations! It's like the age-old joke: "How many students does it take to scrape DataCamp?". "How many?". "Just you!" You've finished the course and we will spend this short video relishing in our new knowledge and accomplishments.

2. Feeding the Machine

One meta-theme I want you to leave with is the following: Many many courses in data science deal with how to process data that is already collected. Supervised vs unsupervised learning, clustering, deep learning, etc. etc. all deal with how to process a collection of data, but these courses or texts rarely deal with how we gather the data itself. Instead, this course has given you one important method of how to acquire these data to start with, from where you can then begin to consider how to analyze and process it. This is an extremely important step in data analysis, getting data to analyze!!

3. Scraping Skills

Rather than listing everything we've learned from start to finish, let's consider it in the opposite order, in the context that motivates our desire to learn this material in the first place. At the top level, we have identified a website or collection of websites with information we want to collect and eventually process. Because of the number of sites, the material on the site, or whatever other reasons, it would be much easier to run through this scraping computationally rather than by hand. So, we decide we're going to write some code to scrape for us. Since we are familiar with python, we choose to use scrapy since it has all the tools we need to scrape single sites, but also to create spiders which can crawl between multiple sites. Now, since we are going to use scrapy, we learn how to manipulate the Selector and Response objects within scrapy, particularly to extract the data we want to collect. But, we should also learn how to tell scrapy which elements to select. For this, we should learn either XPath or CSS Locator notation, buuuut, to make sense out of the XPath or CSS Locator notation, we really should understand the structure of HTML so we can make heads or tails of it.

4. What'd'ya Know?

Notice that you now have learned all these steps in the order you needed so that you can reach the original objective of scraping a site computationally. You have a usable mental model of the structure of HTML. You can translate that knowledge of HTML into a usable XPath or CSS Locator piece of code. You can use `Selector` and `Response` objects in scrapy to navigate to and extract the desired information from a website. You can even build a spider to crawl multiple sites! In summary, you now know how to scrape the web in python.

5. EOT

Thank you again for sticking out this entire course with me. And, again, congratulations!!

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.