Get startedGet started for free

Selector Objects

1. Introduction to the scrapy Selector

In this lesson we will begin to familiarize ourselves with scrapy's Selector object, so named because it is the scrapy object used to select portions of the HTML using XPath (or a so-called CSS Locator -- something we'll learn about later). Some of what we see in this lesson may seem a bit gnarly. But keep in mind that once we master this step, we will have already learned how to use the main parsing tool scrapy offers, letting us actually read in an HTML document and access the inner elements we want.

2. Setting up a Selector

Through this lesson, we will be using the Selector we set up in this slide as our running example. We will import Selector from scrapy. We have made a string of HTML, which we pass to the Selector as text, creating a selector object "sel", which is the object we'll be learning to use. It will become clearer as we move along, but we can think of the Selector "sel" as having "selected" the entire HTML document. Before moving on, let's note that the html has two paragraph elements, the first saying "Hello World!"; the second saying "Enjoy DataCamp!".

3. Selecting Selectors

To put to work all the XPath notation we've built-up, we can call the xpath Selector method to create new Selector objects, selecting the pieces of HTML we are interested in. When doing so, the return value is a SelectorList (a list with some "scrapy extras") containing new Selector objects. For example, if we use an xpath to select all paragraph objects from our running example, we will have a SelectorList of two Selector objects, one for each of the paragraphs.

4. Extracting Data from a SelectorList

Selectors and SelectorLists are nice, but at the end of the day, we really want to access the data inside the Selector or SelectorList. We can do this easily by using the "extract" method. For a SelectorList, by calling the extract method, we are left with a list of strings, where each of the strings is the data from the Selectors which were originally in the SelectorList. If we only want the data (as a string) from the first element of a SelectorList, we can also call the quick extract_first method.

5. Extracting Data from a Selector

Although extract_first is convenient if you want the first piece of data in a SelectorList, we could grab the data from any other Selector within our SelectorList. To give an example of this, lets say we create a SelectorList named ps, then take the second Selector in the list (remembering that python indexes lists starting at 0, so we use the index 1 rather than 2). Then we apply the extract function to this Selector. Note that a Selector only has one piece of data, so extract leaves us with the string of this data (rather than a list of strings as was the case with SelectorLists).

6. Select This Course!

In this lesson, we learned briefly how to set up a scrapy Selector object, but more importantly spent time learning how to use an xpath to select and extract pieces of the HTML code. Basically, you now have the knowledge to actually start scraping an HTML document once you have the HTML input into a Selector object. That is awesome!