Recap: Web Scraping in R

1. Recap: Web Scraping in R

That's it – you reached the end of this course. Congratulations, you did great! Let's quickly recap the concepts you've been introduced to.

2. Concepts covered

In the first two chapters of this course, you were introduced to the relevant technologies that structure modern websites: HTML and CSS. You also got to know the rvest package, which makes it much easier to apply so-called selectors to extract stuff from a web page. This parsing step is extremely important in scraping, and the more you know about it, the easier scraping will become for you. In the third chapter, you got to know an alternative approach to selection with CSS: the XPATH notation. While a lot of scraping techniques are already covered by CSS, XPATH is a little more powerful, for example, through its functions like position() or text(). Another very helpful feature of XPATH is the ability to select nodes based on their surrounding nodes. Trust me, this is especially helpful if the website doesn't make use of a lot of CSS and is generally badly structured. Lastly, you got to know the basic protocol behind all requests to a web server: HTTP. Through the httr package, you learned how you can customize these and therefore be a bit nicer to the pages you're scraping.

3. What to do with the scraped data?

Since data collected from the web are often quite messy, you should also know how to process, tidy, and visualize these datasets. Don't worry, DataCamp has got you covered!

4. Happy scraping!

With that, it's time to say good bye and wish you happy scraping! But remember: Be nice, be gentle!

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.