Turning a webpage into data using BeautifulSoup: getting the text
As promised, in the following exercises, you'll learn the basics of extracting information from HTML soup. In this exercise, you'll figure out how to extract the text from the BDFL's webpage, along with printing the webpage's title.
This exercise is part of the course
Intermediate Importing Data in Python
Exercise instructions
- In the sample code, the HTML response object
html_doc
has already been created: your first task is to Soupify it using the functionBeautifulSoup()
and to assign the resulting soup to the variablesoup
. - Extract the title from the HTML soup
soup
using the attributetitle
and assign the result toguido_title
. - Print the title of Guido's webpage to the shell using the
print()
function. - Extract the text from the HTML soup
soup
using the methodget_text()
and assign toguido_text
. - Hit submit to print the text from Guido's webpage to the shell.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Import packages
import requests
from bs4 import BeautifulSoup
# Specify url: url
url = 'https://www.python.org/~guido/'
# Package the request, send the request and catch the response: r
r = requests.get(url)
# Extract the response as html: html_doc
html_doc = r.text
# Create a BeautifulSoup object from the HTML: soup
# Get the title of Guido's webpage: guido_title
# Print the title of Guido's webpage to the shell
# Get Guido's text: guido_text
# Print Guido's text to the shell
print(guido_text)