Get startedGet started for free

Turning a webpage into data using BeautifulSoup: getting the text

As promised, in the following exercises, you'll learn the basics of extracting information from HTML soup. In this exercise, you'll figure out how to extract the text from the BDFL's webpage, along with printing the webpage's title.

This exercise is part of the course

Intermediate Importing Data in Python

View Course

Exercise instructions

  • In the sample code, the HTML response object html_doc has already been created: your first task is to Soupify it using the function BeautifulSoup() and to assign the resulting soup to the variable soup.
  • Extract the title from the HTML soup soup using the attribute title and assign the result to guido_title.
  • Print the title of Guido's webpage to the shell using the print() function.
  • Extract the text from the HTML soup soup using the method get_text() and assign to guido_text.
  • Hit submit to print the text from Guido's webpage to the shell.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Import packages
import requests
from bs4 import BeautifulSoup

# Specify url: url
url = 'https://www.python.org/~guido/'

# Package the request, send the request and catch the response: r
r = requests.get(url)

# Extract the response as html: html_doc
html_doc = r.text

# Create a BeautifulSoup object from the HTML: soup


# Get the title of Guido's webpage: guido_title


# Print the title of Guido's webpage to the shell


# Get Guido's text: guido_text


# Print Guido's text to the shell
print(guido_text)
Edit and Run Code