Get startedGet started for free

Modify a string using REGEXEXTRACT and REGEXREPLACE

1. Modify a string using REGEXEXTRACT and REGEXREPLACE

Now that we have learned the basics of regular expressions and how to test strings with REGEXMATCH, let's take things a step further and learn how to replace or extract portions of a string using REGEXREPLACE and REGEXEXTRACT, respectively.

2. Recall the digital marketing table...

Before we discuss these specific functions, in what situations would we want to replace or extract part of a string? You may have noticed that Python Users has an uppercase u, while R users has a lowercase u. Using REGEXREPLACE, we can write a regex that matches the lowercase u and replaces it with a capital U, making the table a little cleaner. Or, if we no longer wanted campaigns to include the term users, we could write a regular expression that would match and remove that from the string, making the campaign names R and Python.

3. REGEXREPLACE()

The output from REGEXREPLACE is the original string, plus the replacement string, excluding the part of the string matched by the regular expression. REGEXREPLACE is often used to clean up categorical data. Here is an example of the regular expression for the term users. The single period that is included before the word indicates the space between the words. This would be the correct regular expression to match both the space and the word users after the strings R and Python, and would replace it with the replacement string defined in the function. A common use case is to remove the word and space. To remove a portion of the string, put an empty string as the replacement string in the function.

4. REGEXEXTRACT()

REGEXEXTRACT will return a specific portion of a string. The parenthesis are included to indicate the group of characters to extract, and will return everything except the regular expression. If you do not use parenthesis, the extraction will be equal to the regular expression. The regular expression needed to extract the words Python and R from the user campaigns is similar to the one we saw previously. The period-asterisk combination, wrapped in parenthesis, defines the portion of the string to extract. If we do not provide a regular expression after the period-asterisk combination, we would extract all of the characters in the string. The second part of the regular expression narrows the extraction down to anything that ends with users, regardless of whether or not the word is capitalized. This is the part of the string to match, but not extract. Note that we also define the space between words by including a period before the regular expression that will match the word users.

5. Using REGEXREPLACE() and REGEXEXTRACT()

If you are looking to replace only a certain part of a certain number of strings, you can combine REGEXREPLACE or REGEXEXTRACT with REGEXMATCH wrapped in an IF statement. This will test to make sure the cell matches a criteria and replace or extract the part of the string if it does, or return whatever is defined in the IF False argument if it does not. The only difference between the two formulas shown is the second argument in the IF statement, where REGEXREPLACE is used when the string needs to be replaced, and REGEXEXTRACT is used when a portion of a string needs to be extracted.

6. Let's get to work!

REGEXREPLACE and REGEXEXTRACT are two very useful tools and their use cases will become more clear as you complete the following exercises. Let's get to work!

Create Your Free Account

or

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.