Get startedGet started for free

The XPATH text() function

1. The XPATH text() function

So far, you have learned how CSS can be translated to XPATH and how you can query web pages using one or another. Also, you've been introduced to XPATH functions. An especially helpful one is the text() function.

2. A use case where CSS doesn't get you far

Have a look at the HTML table with the "cast" ID displayed here. There are three rows with two columns each, showing the fictional actors and their respective roles for an animation movie. The roles are wrapped in HTML-emphasis-elements (em). However, their details, that is, whether the actor gave the role its voice or performed its choreography, are merely text items that are directly enclosed within the parent td element. Let's say you only want to access these details and not the role. With CSS, there is no way to extract only the direct text items of the td elements, as this example shows. Not only the details in parentheses but also the text content of the em elements are returned.

3. XPATH to the rescue!

XPATH and its text() function to the rescue! The first part of the XPATH query is equal to the CSS selector before. In the second html_elements() function call though, you can use the XPATH text() function to actually only select text that is a direct descendant of the td element. With that, text contained within further child elements like em is not selected. Notice also the trim argument given to html_text(). With that, you can remove the leading space before the opening parenthesis of each detail section.

4. Another use case for text()

The text() function can do something else that is impossible with CSS: selection by text. What does this mean? Let's say you want to query only the rows in this table where the actor gave the role its voice. With XPATH and the text() function, you can select elements that match a certain text. In order to do that, you can add another selection criterion to the td predicate. Besides querying for the class "role", you can append a filter that matches only td elements that have a " (Voice)" text. However, that's not the whole story.

5. Selecting parent elements

Remember that we wanted to query the whole rows containing the "actor" and "role" information, not just the "role" column. For that, the parent selector of XPATH comes in handy. With two dots you select the parents of the previously selected nodes, which are the full tr elements in this case. That's yet another feature of XPATH where CSS selectors fall short. Notice that the parent selector is especially handy when used in combination with the text() function. First, you drill deep with a specific text filter, then you ascend some levels to return the enclosing parent elements. However, you might have noticed it: Selecting nodes based on properties of children is something that was introduced already in the first video of this chapter. The method with the double dot is just another way of achieving the same goal.

6. Let's practice!

In the following exercises, you'll look at some use cases for the text() function. You'll also practice the parent selector of XPATH. Good luck!

Create Your Free Account

or

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.