Turning a webpage into data using BeautifulSoup: getting the text
As promised, in the following exercises, you'll learn the basics of extracting information from HTML soup. In this exercise, you'll figure out how to extract the text from the BDFL's webpage, along with printing the webpage's title.
Bu egzersiz
Intermediate Importing Data in Python
kursunun bir parçasıdırEgzersiz talimatları
- In the sample code, the HTML response object
html_dochas already been created: your first task is to Soupify it using the functionBeautifulSoup()and to assign the resulting soup to the variablesoup. - Extract the title from the HTML soup
soupusing the attributetitleand assign the result toguido_title. - Print the title of Guido's webpage to the shell using the
print()function. - Extract the text from the HTML soup
soupusing the methodget_text()and assign toguido_text. - Hit submit to print the text from Guido's webpage to the shell.
Uygulamalı interaktif egzersiz
Bu örnek kodu tamamlayarak bu egzersizi bitirin.
# Import packages
import requests
from bs4 import BeautifulSoup
# Specify url: url
url = 'https://www.python.org/~guido/'
# Package the request, send the request and catch the response: r
r = requests.get(url)
# Extract the response as html: html_doc
html_doc = r.text
# Create a BeautifulSoup object from the HTML: soup
# Get the title of Guido's webpage: guido_title
# Print the title of Guido's webpage to the shell
# Get Guido's text: guido_text
# Print Guido's text to the shell
print(guido_text)