Get Started

What is code profiling

1. How do I find the bottleneck?

In the first chapter of this course, we discussed benchmarking functions. That is, timing multiple functions and selecting the best solution. In many applications, there is usually a rate limiting section of code.

2. How do I find the bottleneck?

Essentially a bottle neck that slows down the overall speed of execution. However, you don't always know where the bottle neck is and if you don't know where the bottleneck is, how would you know which part to fix? The correct way of determining where the hold up lies, is to use code profiling.

3. Code profiling

The idea's simple. You run your code and take snapshots of what the program is doing at regular intervals. This gives you data on how long each function took to run. R comes with a built in tool for code profiling called R prof. Unfortunately, R prof isn't that user friendly. An alternative way of profiling is to use the profvis package. Let's jump in and have an example.

4. IMDB data set

The IMBD data set is a data frame with sixty thousand rows and twenty four columns. Each row corresponds to a particular movie. For example, row seven thousand, two hundred and eighty eight

5. Braveheart

corresponds to the classic movie, Braveheart. This amazing, but historically suspect, title has been given a user rating of eight point three out of ten. How does this movie gem compare to other similar titles?

6. Example: Braveheart

First, we load the data set from the package, and extract the Action movies. Next, we generate a scatter plot of year against movie rating and fit a local regression line to get an idea of the trend. Finally we highlight Braveheart. I realize that there are people who are thinking that base graphics is soooo last decade and all the cool kids use ggplot. The downside of ggplot is that when we profile, its call stack is complicated. So for this example, we've kept things simple.

7. Profvis

Profiling this code is straightforward. If you're using RStudio, just highlight the code and select "Profile selected lines" from the Profile tab.

8. Command line

Alternatively, you can use the profvis function. Notice that the curly brackets allow us to pass the entire expression to profvis. Before I reveal profile, which line of code do you think will be slowest? Pause the video for a moment, read the code and have a guess. Restart the video when you want to know the answer. Running this script generates an interactive page that describes the amount of time we spend on

9. The flame graph

each line of code. The two measurements returned by profvis are the amount of memory used in mega bytes and the of time milliseconds we spend in each function. Don't worry about the actual units though. Instead, focus on the relative contribution of each component.

10. The flame graph

Unsurprisingly generating the polynomial regression line is the time consuming process. Somewhat more surprising, is the order operation is almost instantaneous compared to the other function calls. Now it's your turn to profile.

11. Let's practice!