Get startedGet started for free

Parsing HTML with BeautifulSoup

In this interactive exercise, you'll learn how to use the BeautifulSoup package to parse, prettify and extract information from HTML. You'll scrape the data from the webpage of Guido van Rossum, Python's very own Benevolent Dictator for Life. In the following exercises, you'll prettify the HTML and then extract the text and the hyperlinks.

The URL of interest is url = 'https://www.python.org/~guido/'.

This exercise is part of the course

Intermediate Importing Data in Python

View Course

Exercise instructions

  • Import the function BeautifulSoup from the package bs4.
  • Assign the URL of interest to the variable url.
  • Package the request to the URL, send the request and catch the response with a single function requests.get(), assigning the response to the variable r.
  • Use the text attribute of the object r to return the HTML of the webpage as a string; store the result in a variable html_doc.
  • Create a BeautifulSoup object soup from the resulting HTML using the function BeautifulSoup().
  • Use the method prettify() on soup and assign the result to pretty_soup.
  • Hit submit to print to prettified HTML to your shell!

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Import packages
import requests
from ____ import ____

# Specify url: url


# Package the request, send the request and catch the response: r


# Extracts the response as html: html_doc


# Create a BeautifulSoup object from the HTML: soup


# Prettify the BeautifulSoup object: pretty_soup


# Print the response
print(pretty_soup)
Edit and Run Code