Try your own nucleotide frequency plot
Now it's time to take a closer look at the frequency of nucleotides per cycle. The best way to do this is by making a visualization. Usually, the first cycles are a bit random, and then the frequency of nucleotides should stabilize with the coming cycles.
This exercise uses the complete fastq
file SRR1971253 with some pre-processing done for you:
library(ShortRead)
fqsample <- readFastq(dirPath = "data",
pattern = "SRR1971253.fastq")
# extract reads
abc <- alphabetByCycle(sread(fqsample))
# Transpose nucleotides A, C, G, T per column
nucByCycle <- t(abc[1:4,])
# Tidy dataset
nucByCycle <- nucByCycle %>%
as_tibble() %>% # convert to tibble
mutate(cycle = 1:50) # add cycle numbers
Your task is to make a Nucleotide Frequency by Cycle plot using tidyverse
functions!
This exercise is part of the course
Introduction to Bioconductor in R
Exercise instructions
glimpse()
the objectnucByCycle
to get a view of the data.- Pivot the nucleotide letters in
alphabet
usingpivot_longer()
and get a newcount
column. - Make a line plot of
cycle
on the x-axis vscount
on the y-axis, colored byalphabet
.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Glimpse nucByCycle
___
# Create a line plot of cycle vs. count
nucByCycle %>%
# Gather the nucleotide letters in alphabet and get a new count column
pivot_longer(-cycle, names_to = ___, values_to = ___) %>%
ggplot(aes(x = ___, y = ___, color = ___)) +
geom_line(size = 0.5 ) +
labs(y = "Frequency") +
theme_bw() +
theme(panel.grid.major.x = element_blank())