Get startedGet started for free

Multiple assessment

1. Multiple and parallel sequence quality assessment

When dealing with big files, we want to save time and resources. The Rqc package from Bioconductor is a Quality Control Tool for High-Throughput Sequencing Data. It performs parallel processing of entire files to help you assess their quality.

2. Rqc

The package Rqc from Bioconductor will give you a summary report of all your sequence files at once! It uses some of the basic packages that you have learned so far, like Biostrings, IRanges, and S4Vectors, in addition to packages that you will discover in the other Bioconductor courses, such as Rsamtools, GenomicAlignments, and GenomicFiles Its functionality is combined and flavored with CRAN packages like knitr, markdown, and ggplot2 to create a clean report with multiple graphics. In addition, it uses Rcpp for parallel processing.

3. rqcQA

The Rqc function for Quality Assessment is named rqcQA. It will receive, as input, the full location of the fastq files, either compressed or uncompressed. The resulting object is a list. Each item will store one of the input files quality assessment - you can check them using names(). For each of the files, you will get a RqcResultSet class object.

4. rqcQA arguments

One super useful argument for running rqcQA is workers, so you can read files in parallel using your available cores. If you have 8 cores in your computer, the recommended is to use up to 6 workers. This result will only save the quality assessment, not the files. If you want to see the quality assessment in a subset of the input, use the argument sample. In the example, we selected 500 reads. The recommendation is to set a seed before calling a sample. If your sequencing project was called "single-end" then you are set, because by default rqcQA treats all files as single-end. If you have "paired-end", it means you have two files per sample id. To build pairs use a numeric vector, like in the example. So the first two files are one pair and the second two are the next pair.

5. rqcReport and rqcResultSet

To create a Quality Control HTML Report use rqcReport(), the main difference between this and the previous report you learned from ShortRead is that it accepts a custom template, in R markdown for generating your custom reports. Then use browseURL() to show the report in a browser. Additionally, you can pick what parts of the rqcResultSet to display, by using accessors listed in the methods function with class rqcResultSet in quotes.

6. perFileInformation

This summary table is the output of the function perFileInformation() using qaRqc. It shows the files that you are going to explore in the last exercise. These were downloaded from the Sequence Read Archive (SRA). Each file has about 2 million reads. To share a QA report, you don't need to copy the files. You can save the results of the Quality assessment and share it.

7. Plot functions

Rqc has 12 plotting functions for you to use! These are listed here and they are easy to find because they all start with rqc and finish with Plot! A cool tip is to call a function individually and save it as a PDF. You will have a clean plot of your quality assessment!

8. rqcCycleBaseCallsLinePlot

I can't tell you how useful is this package without showing you any of the plots. Here is a line multi-plot of base calls distribution per cycle. This was created using the function rqcCycleBaseCallsLinePlot() with the QA object - only one line of code.

9. Keep calm

So what are you going to do the next time you get a hold of a ton of fastq files? KEEP CALM and use a parallel quality assessment!

10. You are ready!

Believe it! You have reached the top of the hill!