1. Learn
  2. /
  3. Courses
  4. /
  5. Web Scraping in R

Exercise

Select direct descendants with the child combinator

By now, you surely know how to select elements by type, class, or ID. However, there are cases where these selectors won't work, for example, if you only want to extract direct descendants of the top ul element. For that, you will use the child combinator (>) introduced in the video.

Here, your goal is to scrape a list (contained in the languages_html document) of all mentioned computer languages, but without the accompanying information in the sub-bullets:

  <ul id = 'languages'>
    <li>SQL</li>
    <ul>    
      <li>Databases</li>
      <li>Query Language</li>
    </ul>
    <li>R</li>
    <ul>
      <li>Collection</li>
      <li>Analysis</li>
      <li>Visualization</li>
    </ul>
    <li>Python</li>
  </ul>

Instructions 1/2

undefined XP
  • 1

    First, gather all the li elements in the nested list shown above and print their text.

  • 2

    Unlike before, try to extract only direct descendants of the top-level ul element, using the child combinator.