LoslegenKostenlos loslegen

Leverage the uniqueness of IDs

As you know, IDs should be unique across a web page. If you can make sure this is the case, it can reduce the complexity of your scraping selectors drastically.

Here's the structure of an HTML page you might encounter in the wild:

<html>
  <body>
    <div id = 'first'>
      <h1 class = 'big'>Joe Biden</h1>
      <p class = 'first blue'>Democrat</p>
      <p class = 'second blue'>Male</p>
    </div>
    <div id = 'second'>...</div>
    <div id = 'third'>
      <h1 class = 'big'>Donald Trump</h1>
      <p class = 'first red'>Republican</p>
      <p class = 'second red'>Male</p>
    </div>
  </body>
</html>

It has been read in for you with read_html() and is available through structured_html.

Diese Übung ist Teil des Kurses

Web Scraping in R

Kurs anzeigen

Anleitung zur Übung

  • Using html_elements(), find the shortest possible selector to select the first div in structured_html.

Interaktive Übung

Vervollständige den Beispielcode, um diese Übung erfolgreich abzuschließen.

# Select the first div
structured_html %>%
  ___
Code bearbeiten und ausführen