1. Off the Beaten XPath
In this lesson, we are going to continue to build on our XPath syntax, learning to navigate to elements based on their attributes as well as direct to the attribute information within elements.
I know that we're continuing with some stuff that seems technical, but we're close to getting to the point to start using what we've learned to scrape real websites!
2. (At)tribute
Let's start by first pointing out that in XPath notation, the @ symbol is used to distinguish attributes. For example, if we see @class, @id, or @href, in the XPath expression, then it is referring to a class attribute, id attribute, or href attribute, respectively.
3. Brackets and Attributes
We saw before that square brackets can be used in xpath syntax to hone in on a specific element or elements based on their order within a given generation. We can also include other information within square brackets to select specific elements.
4. Brackets and Attributes
For example, the XPath string here will direct to all paragraph elements from //p, and then reduce down to all those whose class attribute is equal to "class-1". Note that we have the class attribute in quotations.
Now, my convention is to use single quotes to define the XPath string, and double quotes as needed within the XPath expression itself.
5. Brackets and Attributes
As another example, we could use the expression here with the wildcard character to first direct to all elements within the HTML document, and reduce down to whichever element has "uid" as its id attribute.
6. Brackets and Attributes
Or, we could combine what we know to first navigate to the div element with id attribute equal to "uid", and then collect the second paragraph child of that div element.
7. Content with Contains
A useful tool we can include within our square-bracketed expression is the "contains" function. The format of the "contains" function is given abstractly here, with the left argument containing the attribute name (including the at symbol), and the right argument is the string expression we want to search for within the given attribute. What it does is searches the attributes of that specific attribute name and matches with those where the string expression is a sub-string of the full attribute.
8. Contain This
To make this clearer, let's look at an example. The expression here will choose all elements in which the string "class-1" is contained as a substring within the full class attribute; this even includes the third paragraph belonging to class-12, because class-1 is a substring of class-12.
9. Contain This
The last example differs from what we've seen so far since the expression here without the contains function only matches elements whose entire class attribute is equal to "class-1".
10. Get Classy
Now, let's consider how to direct to the attribute information itself.
To do so, we first create an XPath expression to the element or elements we want to pull out some attribute information from. Say, we would like to direct to the class attribute of this highlighted paragraph element.
We already know how to direct to the highlighted area.
11. Get Classy
To direct to the attribute itself, we take the XPath, follow it by a forward slash, and follow that by the @ symbol connected to the attribute name of interest, in this case, class.
As a quick note. If we were instead to use a double forward slash before the @ symbol with the attribute name, we would not only direct to the attribute of the elements selected in the XPath, but also all of those attributes in their future generations too.
12. End of the Path
In this lesson, we've looked at how to use the at-symbol attribute notation in an XPath to navigate to elements based on their attributes, as well as navigate to the actual attribute information itself.