Get startedGet started for free

Interrogation after SELECT

1. Interrogation after SELECT

If we are creating a query, we may want to interrogate the data first. What does it look like? How many columns are there? What columns are required? And what columns can we join on?

2. Processing order after SELECT

Previously we said that any syntax processed after SELECT is mainly concerned with actions on the final data extracted. Most of these actions are commonly used to interrogate the data before working on a query. We also said that these processes tend to be quite expensive as they can potentially tie up database processing resources and, therefore, are likely to increase the time it takes a query to run. In this lesson, we’ll look at some ways to interrogate data and the effects these may have on performance.

3. All is not always good

One of the first things we learn when taking a course in SQL is SELECT star from a table. SELECT star returns all rows and all columns from the table. This is great for data interrogation and getting to know your data but potentially bad for performance especially if that table has millions of rows and many columns. We could be waiting for a while.

4. All you need is better

Our query should only select the columns we need.

5. All in a JOIN

Another problem with SELECT star is that, in joins, it returns duplicates of the joining columns. In this example, the PlayerName column is in both the Players and PlayerStats tables.

6. All you need in a JOIN

We need to explicitly state the columns to be returned and from what tables. This is where aliasing is useful.

7. Rows at the TOP

Is there a way to restrict the number of rows returned but still provide us enough to interrogate the data adequately? Yes, we can use the TOP function. Although TOP is one of the last syntaxes to be processed in a query, we use at the beginning of the query, just after SELECT. TOP can be useful to limit the exact number or rows returned.

8. Percentage at the TOP

or, used in conjunction with PERCENT, a proportion of the rows to be returned. Because TOP only processes rows, we still need to specify the columns we require either using star, for all columns or explicitly identifying them after TOP. SELECT TOP 5 will return the first five rows and SELECT TOP 1 PERCENT will return the first 10,000 rows in a million-row table.

9. There is no top or bottom

What if we want to check the last rows of the table instead? TOP is only for the top of the data set right? Technically, yes. But there is no top or bottom. By using ORDER BY, and experimenting with ASC and DESC, we can view a subset of the data set sorted by any column, in the SELECT statement, ascending or descending.

10. The other row limiters

ORDER BY is supported by most major database vendors. However, when interrogating data, by limiting rows returned, be aware that TOP is specific to Microsoft SQL Server. In other courses, we may have been introduced to LIMIT or ROWNUM. These perform the same function but are specific to PostgreSQL and Oracle respectively. Also, their position in the query is quite different.

11. Where to use ORDER BY

Use ORDER BY with caution. It’s works well for interrogating data but slows performance in a query. Often data scientists will only use SQL to extract data from a database and use some other program like R or Python to perform analytics on the data output. If a data set requires sorting before analysis, these programs are a lot more efficient at this. Unless there is a good reason to sort the data in a query, or you want to interrogate the data, leave ORDER BY out.

12. Let's practice!

Let's practice

Create Your Free Account

or

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.