A content extractor

In the previous exercises, you have established that all the elements from the URLs vector you were given return a 200 status code. Now that you know that they are accessible, you will dig deeper into the web scraping, by doing some content extraction.

To do this, we'll use functions from the rvest package, which will be prefilled with partial(). The functions we will write in this exercise will extract all the H2 HTML nodes from a page — on a webpage, these H2 nodes correspond to the level 2 headers. Once we have extracted these titles, the html_text() function will be used to extract the text content from the raw HTML.

purrr and rvest has been loaded for you, and the urls vector is available in your workspace.

Cet exercice fait partie du cours

Intermediate Functional Programming with purrr

Afficher le cours

Instructions

Start by prefilling the html_nodes() with css = "h2".
Combine this newly created function between read_html and html_text, to create a text extractor for H2 headers.
Run this function on the urls vector, and name the result.
Print the result to see what it looks like.

Exercice interactif pratique

Essayez cet exercice en complétant cet exemple de code.

# Prefill html_nodes() with the css param set to h2
get_h2 <- ___(html_nodes, ___)

# Combine the html_text, get_h2 and read_html functions
get_content <- ___(___, ___, ___)

# Map get_content to the urls list
res <- ___(urls, ___) %>%
  set_names(___)

# Print the results to the console
___

Modifier et exécuter le code

Cet exercice fait partie du cours

Intermediate Functional Programming with purrr

IntermédiaireNiveau de compétence

4.8+

Commencer le cours gratuitement

Do lambda functions, mappers, and predicates sound scary to you? Fear no more! After refreshing your purrr memory, we will dive into functional programming 101, discover anonymous functions and predicates, and see how we can use them to clean and explore data.

Exercise 1: purrr basics - a refresher Exercise 2: Refreshing your purrr memory Exercise 3: Another purrr refresher Exercise 4: Introduction to mappers Exercise 5: Creating lambda functions Exercise 6: Lambda functions Exercise 7: Using mappers to clean up your data Exercise 8: Clean up your data with keep Exercise 9: Split up with keep() and discard()Exercise 10: Predicates Exercise 11: What is a predicate?Exercise 12: Exploring data with predicates

Ready to go deeper with functional programming and purrr? In this chapter, we'll discover the concept of functional programming, explore error handling using including safely() and possibly(), and introduce the function compact() for cleaning your code.

Exercise 1: Functional programming in R Exercise 2: Everything that happens is a function call Exercise 3: Identifying pure functions Exercise 4: Tools for functional programming in purrr Exercise 5: Safe iterations Exercise 6: Create a function Exercise 7: Using possibly()Exercise 8: A possibly() version of read_lines()Exercise 9: Everything in one call Exercise 10: Handling adverb results Exercise 11: Purrrfecting our function Exercise 12: Extracting status codes with GET()

In this chapter, we'll use purrr to write code that is clearer, cleaner, and easier to maintain. We'll learn how to write clean functions with compose() and negate(). We'll also use partial() to compose functions by "prefilling" arguments from existing functions. Lastly, we'll introduce list-columns, which are a convenient data structure that helps us write clean code using the Tidyverse.

Exercise 1: Why cleaner code?Exercise 2: How to write compose()Exercise 3: Back to the office Exercise 4: Building functions with compose() and negate()Exercise 5: Build a function Exercise 6: Count the NA Exercise 7: Prefilling functions Exercise 8: A content extractor

Exercice en cours

Exercise 9: Another extractor Exercise 10: List columns Exercise 11: About list-columns Exercise 12: Create a list-column data.frame

We'll wrap up everything we know about purrr in a case study. Here, we'll use purrr to analyze data that has been scraped from Twitter. We'll use clean code to organize the data and then we'll identify Twitter influencers from the 2018 RStudio conference.

Exercise 1: Discovering the dataset Exercise 2: Playing with tweets, round 1 Exercise 3: Identify profiles Exercise 4: Extracting information from the dataset Exercise 5: Counting favorites Exercise 6: Extracting mentions Exercise 7: Manipulating URLs Exercise 8: Analyzing URLs Exercise 9: Playing with URLs Exercise 10: Identifying influencers Exercise 11: Splitting the dataset Exercise 12: We have a winner!Exercise 13: Congratulations!