Introduction to string manipulation
1. Introduction to string manipulation
Welcome to this course! My name is Eugenia and I will guide you in your journey to master regular expressions.2. You will learn
In this course, you will learn how to manipulate strings to find and replace specific substrings. You will also explore different approaches for string formatting, such as interpolating a string in a template. Last, you will dive into basic and advanced regular expressions to master how to find complex patterns in a string.3. Why it is important
As a data scientist, you can encounter strings when cleaning a dataset to prepare it for text mining or sentiment analysis. Sometimes, you will need to process text to feed an algorithm that determines whether an email is spam. Maybe, you will need to parse and extract specific data from a website to build a database.Learning to manipulate strings and master regular expressions will allow you to perform these tasks faster and more efficiently.4. Strings
The first step of our journey is strings, a data type used to represent textual data. Python recognizes any sequence of characters inside quotes as a single string object. As shown on the slide, both single or double quotes can be used. You should use the same quote type to open and close the string. If a quote is part of the string as seen in the code, we need to use the other quote type to enclose the string. Otherwise, python recognizes the second quote as a closing one.5. More strings
Python has built-in functions to handle strings. Suppose we define the following string. We can get the number of characters in the string by applying the function len() which returns eleven as shown in the output. The function str() returns the string representation of an object as seen in the code.6. Concatenation
Suppose now we have the following two strings shown on the slide. You want to concatenate them. Concatenate means obtaining a new string that contains both of the original strings. Applying the plus operand to sum up both strings, specifying also the space, generates the output seen in the code.7. Indexing
Individual characters of a string can be accessed directly using an index; the position of that character within the string. Let's work with the following example. To get the fourth character of the string, we specify the string name followed by the position inside square brackets. In python, string indexing is zero-based meaning that the first character has index zero as shown on the slide. For character four, we specify index three getting the following output. We can also indicate indices with negative numbers. If we specify index minus one, we get the last character of the string as shown in the output.8. Slicing
With the bracket notation, python allows you to access a specific part or sequence of characters within the original string. For that aim, we specify the starting and ending positions inside square brackets separated by a colon as you see on the slide. The ending position is excluded in the resulting output. Omitting the first or second index results in the slice starting at the beginning or going until the end of the string as shown in the output.9. Stride
String slicing also accepts a third index which specifies how many characters to omit before retrieving a character. In the example, the specified indices returns the following output. They are the characters retrieved between positions zero and six, skipping two characters in between. Interestingly, omitting the first and second indices and designating a minus one step returns a reversed string as shown in the output.10. Let's practice!
Now, you are ready to start manipulating string by yourself.Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.