Terug op kantoor

Je werkt nog steeds als data-analist bij een webbureau en je hebt de opdracht gekregen om web scraping te doen. Je hebt een lijst met URL's gekregen om te analyseren, een analyse waar je in het vorige hoofdstuk al mee bent begonnen.

Je verwacht dat deze taak terug zal komen: ongetwijfeld wordt je gevraagd dit over een paar weken opnieuw te doen. Om je toekomstige werk makkelijker te maken, heb je besloten vandaag al schoon code te schrijven, zodat je er later eenvoudiger op kunt terugkomen.

We beginnen met het combineren van de twee functies uit httr die we in het vorige hoofdstuk zagen: GET() om de webpagina op te halen, en status_code() om de statuscode te extraheren, zodat we een statuscode-extractor kunnen maken.

De vector urls is nog steeds beschikbaar in je werkruimte. We hebben alleen de URL's behouden die bereikbaar zijn.

Deze oefening maakt deel uit van de cursus

Gevorderd functioneel programmeren met purrr

Cursus bekijken

Oefeninstructies

Start purrr en httr.
Stel een statusextractor samen met GET() en status_code().
Probeer deze nieuwe functie op "https://www.thinkr.fr" en "https://en.wikipedia.org".
Pas deze functie direct toe op de vector urls.

Praktische interactieve oefening

Probeer deze oefening eens door deze voorbeeldcode in te vullen.

# Launch purrr and httr



# Compose a status extractor 
status_extract <- ___(___, ___)

# Try with "https://thinkr.fr" & "https://en.wikipedia.org"
___("https://thinkr.fr")
___("https://en.wikipedia.org")

# Map it on the urls vector, return a vector of numbers
___(urls, ___)

Code bewerken en uitvoeren

Deze oefening maakt deel uit van de cursus

Gevorderd functioneel programmeren met purrr

SkillTag.level.intermediateSkillTag.label

4.8+

Begin de cursus gratis

Do lambda functions, mappers, and predicates sound scary to you? Fear no more! After refreshing your purrr memory, we will dive into functional programming 101, discover anonymous functions and predicates, and see how we can use them to clean and explore data.

Exercise 1: purrr basics - a refresher Exercise 2: Refreshing your purrr memory Exercise 3: Another purrr refresher Exercise 4: Introduction to mappers Exercise 5: Creating lambda functions Exercise 6: Lambda functions Exercise 7: Using mappers to clean up your data Exercise 8: Clean up your data with keep Exercise 9: Split up with keep() and discard()Exercise 10: Predicates Exercise 11: What is a predicate?Exercise 12: Exploring data with predicates

Ready to go deeper with functional programming and purrr? In this chapter, we'll discover the concept of functional programming, explore error handling using including safely() and possibly(), and introduce the function compact() for cleaning your code.

Exercise 1: Functional programming in R Exercise 2: Everything that happens is a function call Exercise 3: Identifying pure functions Exercise 4: Tools for functional programming in purrr Exercise 5: Safe iterations Exercise 6: Create a function Exercise 7: Using possibly()Exercise 8: A possibly() version of read_lines()Exercise 9: Everything in one call Exercise 10: Handling adverb results Exercise 11: Purrrfecting our function Exercise 12: Extracting status codes with GET()

In this chapter, we'll use purrr to write code that is clearer, cleaner, and easier to maintain. We'll learn how to write clean functions with compose() and negate(). We'll also use partial() to compose functions by "prefilling" arguments from existing functions. Lastly, we'll introduce list-columns, which are a convenient data structure that helps us write clean code using the Tidyverse.

Exercise 1: Waarom schonere code?Exercise 2: Hoe schrijf je compose()Exercise 3: Terug op kantoor

Huidige oefening

Exercise 4: Functies bouwen met compose() en negate()Exercise 5: Bouw een functie Exercise 6: Tel de NA's Exercise 7: Argumenten vooraf invullen bij functies Exercise 8: Een content-extractor Exercise 9: Nog een extractor Exercise 10: Lijstkolommen Exercise 11: Over list-kolommen Exercise 12: Maak een data.frame met een lijst-kolom

We'll wrap up everything we know about purrr in a case study. Here, we'll use purrr to analyze data that has been scraped from Twitter. We'll use clean code to organize the data and then we'll identify Twitter influencers from the 2018 RStudio conference.

Exercise 1: Discovering the dataset Exercise 2: Playing with tweets, round 1 Exercise 3: Identify profiles Exercise 4: Extracting information from the dataset Exercise 5: Counting favorites Exercise 6: Extracting mentions Exercise 7: Manipulating URLs Exercise 8: Analyzing URLs Exercise 9: Playing with URLs Exercise 10: Identifying influencers Exercise 11: Splitting the dataset Exercise 12: We have a winner!Exercise 13: Congratulations!