Profielen identificeren

We werken nog steeds aan een verkenning van onze gegevensset met tweets. Deze elementen staan in een geneste lijst van 5055 sublijsten, die we verkennen met purrr.

In deze oefening beantwoorden we een vraag over het gedrag van gebruikers: hoeveel gebruikers hebben alleen geretweet, zonder ooit "originele content" te plaatsen? Een vuistregel op Twitter is dat ongeveer 80% van de mensen alleen retweet, terwijl 20% content publiceert, volgens de wet van Pareto. Dat gaan we in deze oefening controleren.

Daarvoor moeten we onze gegevensset in tweeën splitsen en vervolgens tellen hoeveel gebruikers er in totaal zijn, en hoeveel gebruikers alleen in de groep "alleen retweeten" zitten.

purrr is voor je geladen en de lijst rstudioconf is nog steeds beschikbaar in je werkruimte.

Deze oefening maakt deel uit van de cursus

Gevorderd functioneel programmeren met purrr

Cursus bekijken

Oefeninstructies

Maak een sublijst van retweets, extraheer het element user_id en verwijder de duplicaten met unique()
Maak een sublijst van originele tweets, extraheer het element user_id en verwijder de duplicaten met unique().
Combineer union() (uit base R) en length() om het totaal aantal gebruikers te bepalen.
Gebruik de functie setdiff() (uit base R) om de gebruikers te krijgen die alleen in de retweet-sublijst staan.

Praktische interactieve oefening

Probeer deze oefening eens door deze voorbeeldcode in te vullen.

# Keep the RT, extract the user_id, remove the duplicate
rt <- ___(___, "is_retweet") %>%
  ___("user_id") %>% 
  ___()

# Remove the RT, extract the user id, remove the duplicate
non_rt <- ___(rstudioconf, "is_retweet") %>%
  ___("user_id") %>% 
  ___()

# Determine the total number of users
___(rt, non_rt) %>% ___()

# Determine the number of users who has just retweeted
___(rt, non_rt) %>% ___()

Code bewerken en uitvoeren

Deze oefening maakt deel uit van de cursus

Gevorderd functioneel programmeren met purrr

SkillTag.level.intermediateSkillTag.label

4.8+

Begin de cursus gratis

Do lambda functions, mappers, and predicates sound scary to you? Fear no more! After refreshing your purrr memory, we will dive into functional programming 101, discover anonymous functions and predicates, and see how we can use them to clean and explore data.

Exercise 1: purrr basics - a refresher Exercise 2: Refreshing your purrr memory Exercise 3: Another purrr refresher Exercise 4: Introduction to mappers Exercise 5: Creating lambda functions Exercise 6: Lambda functions Exercise 7: Using mappers to clean up your data Exercise 8: Clean up your data with keep Exercise 9: Split up with keep() and discard()Exercise 10: Predicates Exercise 11: What is a predicate?Exercise 12: Exploring data with predicates

Ready to go deeper with functional programming and purrr? In this chapter, we'll discover the concept of functional programming, explore error handling using including safely() and possibly(), and introduce the function compact() for cleaning your code.

Exercise 1: Functional programming in R Exercise 2: Everything that happens is a function call Exercise 3: Identifying pure functions Exercise 4: Tools for functional programming in purrr Exercise 5: Safe iterations Exercise 6: Create a function Exercise 7: Using possibly()Exercise 8: A possibly() version of read_lines()Exercise 9: Everything in one call Exercise 10: Handling adverb results Exercise 11: Purrrfecting our function Exercise 12: Extracting status codes with GET()

In this chapter, we'll use purrr to write code that is clearer, cleaner, and easier to maintain. We'll learn how to write clean functions with compose() and negate(). We'll also use partial() to compose functions by "prefilling" arguments from existing functions. Lastly, we'll introduce list-columns, which are a convenient data structure that helps us write clean code using the Tidyverse.

Exercise 1: Why cleaner code?Exercise 2: How to write compose()Exercise 3: Back to the office Exercise 4: Building functions with compose() and negate()Exercise 5: Build a function Exercise 6: Count the NA Exercise 7: Prefilling functions Exercise 8: A content extractor Exercise 9: Another extractor Exercise 10: List columns Exercise 11: About list-columns Exercise 12: Create a list-column data.frame

We'll wrap up everything we know about purrr in a case study. Here, we'll use purrr to analyze data that has been scraped from Twitter. We'll use clean code to organize the data and then we'll identify Twitter influencers from the 2018 RStudio conference.

Exercise 1: De gegevensset verkennen Exercise 2: Spelen met tweets, ronde 1 Exercise 3: Profielen identificeren

Huidige oefening

Exercise 4: Informatie uit de gegevensset halen Exercise 5: Favorieten tellen Exercise 6: Mentions extraheren Exercise 7: URL's manipuleren Exercise 8: URL's analyseren Exercise 9: Spelen met URL’s Exercise 10: Influencers identificeren Exercise 11: De gegevensset opsplitsen Exercise 12: We hebben een winnaar!Exercise 13: Gefeliciteerd!