Dot notation and SQL
1. Dot notation and SQL
Previously we learned how to use SQL tables. We now learn how query dataframes using either SQL queries or using dataframe dot notation. This allows us to compare and contrast these two notations.2. Our table has 3 columns
For example, suppose we have a dataframe containing three columns, and we want to select only two columns,3. We only need 2
We could do this.4. Three ways to select 2 columns
Printed again here is how we obtained the result in the previous slide. See that the column train_id is a string given in quotes. We can also do the following: using dot notation. This time the column is given as df.train_id. We can also import this column function: which allows us to do the following: This time, the column is given as an argument to this new operator. It may seem more verbose in this case. However, it is useful in other cases. Such as in the following,5. Two ways to rename a column
To rename a column we can use the withColumnRenamed function. But we could also use the column operator, like so:6. Don’t do this!
Pro tip: try not to use all three conventions at the same time without good reason.7. SQL queries using dot notation
Most Spark sql queries can be done in either dot notation or sql notation. Here’s an example. Notice that the limit operation is done at query time instead of at show time. We can get the same result using dot notation, like so: Note how we used the column operator to select the train_id column, and renamed it in place.8. Window function SQL
Window functions can also be done in either sql or dot notation. This query adds a number to each stop on a train line -- in a new column called id.9. Window function SQL
Note how the id column starts over for train_id 324.10. Window function using dot notation
Here’s the same result using dot notation. There is typically a dot notation equivalent of every sql clause including window functions. The row_number sql clause has an equivalent dot notation sql function. The inside of an OVER clause is handled by a Window object. The Window object provides methods for it to be partitioned - and ordered. Some people prefer the SQL version, other people prefer the dot notation.11. Using a WindowSpec
A "WindowSpec" is defined using the "Window" class, and then used subsequently as an argument to the over() function in a window function query. Here is an example.12. Let's practice
Let's practice!Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.