Get startedGet started for free

Try your own nucleotide frequency plot

Now it's time to take a closer look at the frequency of nucleotides per cycle. The best way to do this is by making a visualization. Usually, the first cycles are a bit random, and then the frequency of nucleotides should stabilize with the coming cycles.

This exercise uses the complete fastq file SRR1971253 with some pre-processing done for you:

library(ShortRead)
fqsample <- readFastq(dirPath = "data", 
                      pattern = "SRR1971253.fastq")
# extract reads                      
abc <- alphabetByCycle(sread(fqsample))

# Transpose nucleotides A, C, G, T per column
nucByCycle <- t(abc[1:4,]) 

# Tidy dataset
nucByCycle <- nucByCycle %>% 
  as_tibble() %>% # convert to tibble
  mutate(cycle = 1:50) # add cycle numbers

Your task is to make a Nucleotide Frequency by Cycle plot using tidyverse functions!

This exercise is part of the course

Introduction to Bioconductor in R

View Course

Exercise instructions

  • glimpse() the object nucByCycle to get a view of the data.
  • Pivot the nucleotide letters in alphabet using pivot_longer() and get a new count column.
  • Make a line plot of cycle on the x-axis vs count on the y-axis, colored by alphabet.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Glimpse nucByCycle
___

# Create a line plot of cycle vs. count
nucByCycle %>% 
  # Gather the nucleotide letters in alphabet and get a new count column
  pivot_longer(-cycle, names_to = ___, values_to = ___) %>% 
  ggplot(aes(x = ___, y =  ___, color = ___)) +
  geom_line(size = 0.5 ) +
  labs(y = "Frequency") +
  theme_bw() +
  theme(panel.grid.major.x = element_blank())
Edit and Run Code