Illumina TruSeq stranded mRNA sequencing: How it works

July 29, 2018

My research project involves a lot of RNA sequencing, mostly Illumina HiSeq paired-end. I have long sought to understand how exactly the sequencing reads listed in the FASTQ files were actually positioned relative to the starting RNA molecule. After some research on the internet, I made this illustrated explanation based on a concrete example. I hope it can help those who—like me—were a bit lost in the beginning!

Introduction

This post aims to explain how Illumina paired-end sequencing reads are produced with the TruSeq Stranded mRNA kit. Information was taken from the Sample Preparation Guide and the Reference Guide provided by Illumina, as well as from this well-detailed document from Tufts University. The great RNA-seqlopedia edited by researchers from the University of Oregon is another gold mine of information for those interested in all aspects of RNA-seq technologies.

All oligonucleotide sequences given in this post (primers, adapters, flow cell-bound oligos) are proprietary to Illumina. They are subject to the following legal copyright notice:

Oligonucleotide sequences © 2018 Illumina, Inc. All rights reserved. Derivative works created by Illumina customers are authorized for use with Illumina instruments and products only. All other uses are strictly prohibited.

RNA cleavage and priming

Consider a fictional mRNA having the following sequence:

5’ GAACUGAUAAAGAGGAAACUAAAGCCACCCCAGAGUAUUUCUGAUUGCAGCCAUUGGUGCCUGCCUGGAAUGCCC

   GAUACAUGGAAUAGGUUACUAUAUGCAUCUCUGCUUUUGGAUCACCCCAUUGAUUCCCCUAGUCUAUCACAUUUG

   GUCUCUUGAAAAUCAGAGAUUUACCAUCAGUAGGCUAUAUUAGUUGUUGUAUUAGUGUGGAUAGUGGUGGCAAUU

   UCAACCCAUAUUUUCUUAACCAGCCCGUCAUAAACCCAUUUGGGUUAACCCAUUUUCAACCCGCAAUAUAAUGGA

   UCAAAUGUGAUUUCAUAUUUCUAUUGGAUUCCUCCUGCUAGCCUAAGCGAGAUAUCUACUGAGGGAACUUUUGAA

   CUUUCAUAUGAGUGUAUAAACGUGUAUGUUGAUUGUUAAAAAAAAAAAAAAAAAA 3’

RNAs are first heated (8 min at 94 °C) to generate fragments of random sizes. Let’s work with the following 200-bp fragment in the rest of this post:

5’ CAGAGAUUUACCAUCAGUAGGCUAUAUUAGUUGUUGUAUUAGUGUGGAUAGUGGUGGCAAUUUCAACCCAUAUUU

   UCUUAACCAGCCCGUCAUAAACCCAUUUGGGUUAACCCAUUUUCAACCCGCAAUAUAAUGGAUCAAAUGUGAUUU

   CAUAUUUCUAUUGGAUUCCUCCUGCUAGCCUAAGCGAGAUAUCUACUGAG 3’

mRNA fragments are hybridized to a mix of random hexamers, which will be used as primers for the cDNA first strand synthesis:

5’ CAGAGAUUUACCAUCAGUAGGCUAUAUUAGUUGUUGUAUUAGUGUGGAUAGUGGUGGCAAUUUCAACCCAUAUUU

   UCUUAACCAGCCCGUCAUAAACCCAUUUGGGUUAACCCAUUUUCAACCCGCAAUAUAAUGGAUCAAAUGUGAUUU

   CAUAUUUCUAUUGGAUUCCUCCUGCUAGCCUAAGCGAGAUAUCUACUGAG 3’

                             ← 3’ TTCGCT 5’

When sequencing mRNAs, random primers allow to get a more uniform distribution of fragments along RNA molecules. Using oligo-dT primers would obviously lead to an over-representation of polyA-containing fragments stemming from 3′ ends of mRNAs. Even if fragmentation were performed after oligo-dT-primed cDNA synthesis, we would still get a 3′-enriched distribution of fragments, due to the low processivity of reverse transcriptases.

Reverse transcription

First strand synthesis

The SuperScriptII reverse transcriptase is used to synthesize the first cDNA strand, which is complementary to the initial mRNA. Actinomycin D is included to prevent DNA-dependent DNA synthesis.

5’ CAGAGAUUUACCAUCAGUAGGCUAUAUUAGUUGUUGUAUUAGUGUGGAUAGUGGUGGCAAUUUCAACCCAUAUUU

3’ GTCTCTAAATGGTAGTCATCCGATATAATCAACAACATAATCACACCTATCACCACCGTTAAAGTTGGGTATAAA

   UCUUAACCAGCCCGUCAUAAACCCAUUUGGGUUAACCCAUUUUCAACCCGCAAUAUAAUGGAUCAAAUGUGAUUU

   AGAATTGGTCGGGCAGTATTTGGGTAAACCCAATTGGGTAAAAGTTGGGCGTTATATTACCTAGTTTACACTAAA

   CAUAUUUCUAUUGGAUUCCUCCUGCUAGCCUAAGCGAGAUAUCUACUGAG 3’

   GTATAAAGATAACCTAAGGAGGACGATCGGATTCGCT 5’

Second strand synthesis

RNase H is used to nick the template mRNA strand by hydrolyzing phosphodiester bonds randomly along the RNA-DNA duplex. This produces new 3′-OH ends available to prime synthesis of the second strand.

5’ CAGAGAUUUACCAUCAGUAGGCU AUAUUAGUUGUUGUAUUAGUGUGGAUAGUGGU GGCAAUUUCAACCCAUA

3’ GTCTCTAAATGGTAGTCATCCGA TATAATCAACAACATAATCACACCTATCACCA CCGTTAAAGTTGGGTAT

   UUUUCU UAACCAGCCCGUCAUAAACCCAUUUGGGUU AACCCAUUUUCAACCCGCAAUAUAAUGGAUCAAAUG

   AAAAGA ATTGGTCGGGCAGTATTTGGGTAAACCCAA TTGGGTAAAAGTTGGGCGTTATATTACCTAGTTTAC

   UGAUUUCAUAUUU CUAUUGGAUUCCUCCUGCUAGCCUAAGCGAGAUAUCUACUGAG 3’

   ACTAAAGTATAAA GATAACCTAAGGAGGACGATCGGATTCGCT 5’

DNA polymerase I synthesizes complementary DNA fragments, starting at each 3′-OH end (5′ → 3′ polymerase activity) while degrading the RNA as it moves forward (5′ → 3′ exonuclease activity). Importantly, the nucleotide mix used for second strand synthesis includes dUTP instead of dTTP. This is will be used later to ensure the “strandedness” of the library. After this step, the initial 5′ mRNA remains bound to the cDNA.

5’ CAGAGAUUUACCAUCAGUAGGCU AUAUUAGUUGUUGUAUUAGUGUGGAUAGUGGU GGCAAUUUCAACCCAUA

3’ GTCTCTAAATGGTAGTCATCCGA TATAATCAACAACATAATCACACCTATCACCA CCGTTAAAGTTGGGTAT

   UUUUCU UAACCAGCCCGUCAUAAACCCAUUUGGGUU AACCCAUUUUCAACCCGCAAUAUAAUGGAUCAAAUG

   AAAAGA ATTGGTCGGGCAGTATTTGGGTAAACCCAA TTGGGTAAAAGTTGGGCGTTATATTACCTAGTTTAC

   UGAUUUCAUAUUU CUAUUGGAUUCCUCCUGCUAGCCUAAGCGAGAUAUCUACUGAG 3’

   ACTAAAGTATAAA GATAACCTAAGGAGGACGATCGGATTCGCT 5’

DNA ligase then reestablishes phosphodiester bonds between second strand cDNA fragments.

5’ CAGAGAUUUACCAUCAGUAGGCU AUAUUAGUUGUUGUAUUAGUGUGGAUAGUGGUGGCAAUUUCAACCCAUAU

3’ GTCTCTAAATGGTAGTCATCCGA TATAATCAACAACATAATCACACCTATCACCACCGTTAAAGTTGGGTATA

   UUUCUUAACCAGCCCGUCAUAAACCCAUUUGGGUUAACCCAUUUUCAACCCGCAAUAUAAUGGAUCAAAUGUGA

   AAAGAATTGGTCGGGCAGTATTTGGGTAAACCCAATTGGGTAAAAGTTGGGCGTTATATTACCTAGTTTACACT

   UUUCAUAUUUCUAUUGGAUUCCUCCUGCUAGCCUAAGCGA 3’

   AAAGTATAAAGATAACCTAAGGAGGACGATCGGATTCGCT 5’

The cDNA then undergoes a blunting step, after which the RNA fragment in 5′ is lost.

5’ AUAUUAGUUGUUGUAUUAGUGUGGAUAGUGGUGGCAAUUUCAACCCAUAUUUUCUUAACCAGCCCGUCAUAAAC

3’ TATAATCAACAACATAATCACACCTATCACCACCGTTAAAGTTGGGTATAAAAGAATTGGTCGGGCAGTATTTG

   CCAUUUGGGUUAACCCAUUUUCAACCCGCAAUAUAAUGGAUCAAAUGUGAUUUCAUAUUUCUAUUGGAUUCCUC

   GGTAAACCCAATTGGGTAAAAGTTGGGCGTTATATTACCTAGTTTACACTAAAGTATAAAGATAACCTAAGGAG

   CUGCUAGCCUAAGCGA 3’

   GACGATCGGATTCGCT 5’

If we rotate the cDNA by 180°, we get:

5’ TCGCTTAGGCTAGCAGGAGGAATCCAATAGAAATATGAAATCACATTTGATCCATTATATTGCGGGTTGAAAAT

3’ AGCGAAUCCGAUCGUCCUCCUUAGGUUAUCUUUAUACUUUAGUGUAAACUAGGUAAUAUAACGCCCAACUUUUA

   GGGTTAACCCAAATGGGTTTATGACGGGCTGGTTAAGAAAATATGGGTTGAAATTGCCACCACTATCCACACTA

   CCCAAUUGGGUUUACCCAAAUACUGCCCGACCAAUUCUUUUAUACCCAACUUUAACGGUGGUGAUAGGUGUGAU

   ATACAACAACTAATAT 3’

   UAUGUUGUUGAUUAUA 5’

Adapter ligation

3′ adenylation

The cDNA 3′ ends are adenylated to facilitate subsequent adapter ligation:

5’  TCGCTTAGGCTAGCAGGAGGAATCCAATAGAAATATGAAATCACATTTGATCCATTATATTGCGGGTTGAAAAT

3’ AAGCGAAUCCGAUCGUCCUCCUUAGGUUAUCUUUAUACUUUAGUGUAAACUAGGUAAUAUAACGCCCAACUUUUA

    GGGTTAACCCAAATGGGTTTATGACGGGCTGGTTAAGAAAATATGGGTTGAAATTGCCACCACTATCCACACTA

    CCCAAUUGGGUUUACCCAAAUACUGCCCGACCAAUUCUUUUAUACCCAACUUUAACGGUGGUGAUAGGUGUGAU

    ATACAACAACTAATATA 3’

    UAUGUUGUUGAUUAUA 5’

Sequences of Illumina TruSeq adapters

TruSeq Universal Adapter

5’ AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATC*T 3’

The part underlined will bind to complementary oligos on the flow cell. The asterisk (*) represents a phosphorothioate bond, which protects the final thymidine against exonucleasic degradation. This is particularly important, since this thymidine is the one that will hybridize to the adenosine we have just added to the cDNA 3′ ends.

TruSeq Index Adapter (here with index 1)

5’ P-GATCGGAAGAGCACACGTCTGAACTCCAGTCACATCACGATCTCGTATGCCGTCTTCTGCTTG 3’

The part underlined will bind to complementary oligos on the flow cell. The P- represents a phosphate group added to the adapter’s 5′ end to allow ligation to the extra 3′ adenosines on the cDNA. The yellow region in the multiplexing index, a “barcode” which allows to specifically tag fragments from different libraries that will be sequenced at the same time. There are 24 indexes:

Index 1 ATCACG Index 7 CAGATC Index 13 AGTCAACA Index 20 GTGGCCTT
Index 2 CGATGT Index 8 ACTTGA Index 14 AGTTCCGT Index 21 GTTTCGGA
Index 3 TTAGGC Index 9 GATCAG Index 15 ATGTCAGA Index 22 CGTACGTA
Index 4 TGACCA Index 10 TAGCTT Index 16 CCGTCCCG Index 23 GAGTGGAT
Index 5 ACAGTG Index 11 GGCTAC Index 18 GTCCGCAC Index 25 ACTGATAT
Index 6 GCCAAT Index 12 CTTGTA Index 19 GTGAAACG Index 27 ATTCCTTT

Formation of “Y-shaped” adapter complexes

Both adapters are mixed in equal amounts and denaturated at 95 °C. The temperature is then lowered to 70 °C in order for the adapters to anneal to each other along a short 12-nt terminal region. This forms “Y-shaped” complexes that will be ligated to both extremities of all cDNA fragments.

      5’ AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATC*T 3’

                                                      ············

3’ GTTCGTCTTCTGCCGTATGCTCTAGCACTACACTGACCTCAAGTCTGCACACGAGAAGGCTAG-P 5’

After the ligation, the “left end” of the cDNA (5′ end of the first strand) looks like this:

      5’ AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCTTCGCTTAG… 3’

                                                      ······················

3’ GTTCGTCTTCTGCCGTATGCTCTAGCACTACACTGACCTCAAGTCTGCACACGAGAAGGCTAGAAGCGAAUC… 5’

The “right end” of the cDNA (3′ end of the first strand) looks like this:

5’ …ACTAATATAGATCGGAAGAGCACACGTCTGAACTCCAGTCACATCACGATCTCGTATGCCGTCTTCTGCTTG 3’

   ······················

3’ UGAUUAUATCTAGCCTTCTCGCAGCACATCCCTTTCTCACATCTAGAGCCACCAGCGGCATAGTAA 5’

PCR enrichment

The library can now be amplified by PCR. Here, the TruSeq protocol involves a particular polymerase that stops right after meeting a U base on the template. Since the cDNA second strand was synthesized with Us instead of Ts, only the cDNA first strand can be used as a template for PCR, which makes the library stranded.

The primers used for this PCR are:

Primer 1: identical to the first (5′-most) 40 bases of the universal adapter.

5’ AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGA 3’

Primer 2: complementary to the last (3′-most) 24 bases of the indexed adapter.

5’ CAAGCAGAAGACGGCATACGAGAT 3’

First round of PCR

At the first round, only primer 2 can anneal to cDNA fragments, at the 3′ end of indexed adapters. The PCR begins along both strands (light gray letters: ATCG), but the presence of dUTPs in the second cDNA strand interrupts the reaction (*). In the end, only the first cDNA strand is completely amplified and available for subsequent PCR rounds.

      3’ TTACTATGCCGCTGGTGGCTCTAGATGTGAGAAAGGGATGTGCTGCGAGAAGGCTAGAAGCGAATCC

      5’ AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCTTCGCTTAGG

3’ GTTCGTCTTCTGCCGTATGCTCTAGCACTACACTGACCTCAAGTCTGCACACGAGAAGGCTAGAAGCGAAUCC

5’ CAAGCAGAAGACGGCATACGAGATCGTGATGTGACTGGAGTTCAGACGTGTGCTCTTCCGATCTATACTT*

   GATCGTCCTCCTTAGGTTATCTTTATACTTTAGTGTAAACTAGGTAATATAACGCCCAACTTTTACCCAATTG

   CTAGCAGGAGGAATCCAATAGAAATATGAAATCACATTTGATCCATTATATTGCGGGTTGAAAATGGGTTAAC

   GAUCGUCCUCCUUAGGUUAUCUUUAUACUUUAGUGUAAACUAGGUAAUAUAACGCCCAACUUUUACCCAAUUG

   GGTTTACCCAAATACTGCCCGACCAATTCTTTTATACCCAACTTTAACGGTGGTGATAGGTGTGATTATGTTG

   CCAAATGGGTTTATGACGGGCTGGTTAAGAAAATATGGGTTGAAATTGCCACCACTATCCACACTAATACAAC

   GGUUUACCCAAAUACUGCCCGACCAAUUCUUUUAUACCCAACUUUAACGGUGGUGAUAGGUGUGAUUAUGUUG

   TTGATTATATCTAGCCTTCTCGTGTGCAGACTTGAGGTCAGTGTAGTGCTAGAGCATACGGCAGAAGACGAAC 5’

   AACTAATATAGATCGGAAGAGCACACGTCTGAACTCCAGTCACATCACGATCTCGTATGCCGTCTTCTGCTTG 3’

   UUGAUUAUATCTAGCCTTCTCGCAGCACATCCCTTTCTCACATCTAGAGCCACCAGCGGCATAGTAA 5’

Subsequent rounds of PCR

Primer 1 can now anneal to the universal adapter. The number of cycles is controlled (~15) to avoid skewing relative abundance of transcripts in the library.

3’ TTACTATGCCGCTGGTGGCTCTAGATGTGAGAAAGGGATGTGCTGCGAGAAGGCTAGAAGCGAATCCGATCGT

5’ AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCTTCGCTTAGGCTAGCA

3’ TTACTATGCCGCTGGTGGCTCTAGATGTGAGAAAGGGATGTGCTGCGAGAAGGCTAGAAGCGAATCCGATCGT

5’ AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCTTCGCTTAGGCTAGCA

   CCTCCTTAGGTTATCTTTATACTTTAGTGTAAACTAGGTAATATAACGCCCAACTTTTACCCAATTGGGTTTA

   GGAGGAATCCAATAGAAATATGAAATCACATTTGATCCATTATATTGCGGGTTGAAAATGGGTTAACCCAAAT

   CCTCCTTAGGTTATCTTTATACTTTAGTGTAAACTAGGTAATATAACGCCCAACTTTTACCCAATTGGGTTTA

   GGAGGAATCCAATAGAAATATGAAATCACATTTGATCCATTATATTGCGGGTTGAAAATGGGTTAACCCAAAT

   CCCAAATACTGCCCGACCAATTCTTTTATACCCAACTTTAACGGTGGTGATAGGTGTGATTATGTTGTTGATT

   GGGTTTATGACGGGCTGGTTAAGAAAATATGGGTTGAAATTGCCACCACTATCCACACTAATACAACAACTAA

   CCCAAATACTGCCCGACCAATTCTTTTATACCCAACTTTAACGGTGGTGATAGGTGTGATTATGTTGTTGATT

   GGGTTTATGACGGGCTGGTTAAGAAAATATGGGTTGAAATTGCCACCACTATCCACACTAATACAACAACTAA

   ATATCTAGCCTTCTCGTGTGCAGACTTGAGGTCAGTGTAGTGCTAGAGCATACGGCAGAAGACGAAC

   TATAGATCGGAAGAGCACACGTCTGAACTCCAGTCACATCACGATCTCGTATGCCGTCTTCTGCTTG 3’

   ATATCTAGCCTTCTCGTGTGCAGACTTGAGGTCAGTGTAGTGCTAGAGCATACGGCAGAAGACGAAC 5’

   TATAGATCGGAAGAGCACACGTCTGAACTCCAGTCACATCACGATCTCGTATGCCGTCTTCTGCTTG 3’

In the end, cDNA fragments are oriented: the universal and the indexed adapters are located to the 5′-most and 3′-most regions of the first cDNA strand, respectively.

5’ AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCTTCGCTTAGGCTAGCA

3’ TTACTATGCCGCTGGTGGCTCTAGATGTGAGAAAGGGATGTGCTGCGAGAAGGCTAGAAGCGAATCCGATCGT

   GGAGGAATCCAATAGAAATATGAAATCACATTTGATCCATTATATTGCGGGTTGAAAATGGGTTAACCCAAAT

   CCTCCTTAGGTTATCTTTATACTTTAGTGTAAACTAGGTAATATAACGCCCAACTTTTACCCAATTGGGTTTA

   GGGTTTATGACGGGCTGGTTAAGAAAATATGGGTTGAAATTGCCACCACTATCCACACTAATACAACAACTAA

   CCCAAATACTGCCCGACCAATTCTTTTATACCCAACTTTAACGGTGGTGATAGGTGTGATTATGTTGTTGATT

   TATAGATCGGAAGAGCACACGTCTGAACTCCAGTCACATCACGATCTCGTATGCCGTCTTCTGCTTG 3’

   ATATCTAGCCTTCTCGTGTGCAGACTTGAGGTCAGTGTAGTGCTAGAGCATACGGCAGAAGACGAAC 5’

Binding to flow cell

The cDNA library can now be introduced in the Illumina flow cell, which is coated with two different kinds of oligos. The bullet points (•••) represent the bond to the flow cell.

Universal adapter oligo: complementary to the first (5′-most) 20 bases of the universal adaptor:

5’ TCGGTGGTCGCCGTATCATT••• 3’

Index adapter oligo: identical to the last (3′-most) 20 bases of the index adaptor.

5’ CGTATGCCGTCTTCTGCTTG••• 3’

At first, cDNAs are bound to the flow cell by both extremities, forming a horseshoe- or brige-like structure:

•••TTACTATGCCGCTGGTGGCT

5’ AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCTTCGCTTAGGCTAGCA

3’ TTACTATGCCGCTGGTGGCTCTAGATGTGAGAAAGGGATGTGCTGCGAGAAGGCTAGAAGCGAATCCGATCGT

   GGAGGAATCCAATAGAAATATGAAATCACATTTGATCCATTATATTGCGGGTTGAAAATGGGTTAACCCAAAT

   CCTCCTTAGGTTATCTTTATACTTTAGTGTAAACTAGGTAATATAACGCCCAACTTTTACCCAATTGGGTTTA

   GGGTTTATGACGGGCTGGTTAAGAAAATATGGGTTGAAATTGCCACCACTATCCACACTAATACAACAACTAA

   CCCAAATACTGCCCGACCAATTCTTTTATACCCAACTTTAACGGTGGTGATAGGTGTGATTATGTTGTTGATT

   TATAGATCGGAAGAGCACACGTCTGAACTCCAGTCACATCACGATCTCGTATGCCGTCTTCTGCTTG 3’

   ATATCTAGCCTTCTCGTGTGCAGACTTGAGGTCAGTGTAGTGCTAGAGCATACGGCAGAAGACGAAC 5’

                                                  CGTATGCCGTCTTCTGCTTG•••

Bridge PCR (not covered here) allows to form clusters of flow cell-bound cDNA molecules that correspond to the same initial mRNA fragment. After denaturation of cDNA double strands, bridge-like structures are linearized. Clusters are then made of a balanced mix of two types of fragments, used for the first and second round of sequencing, respectively.

Fragment containing the second cDNA strand, used for the first round of sequencing:

3’ TTACTATGCCGCTGGTGGCTCTAGATGTGAGAAAGGGATGTGCTGCGAGAAGGCTAGAAGCGAATCCGATCGT

   CCTCCTTAGGTTATCTTTATACTTTAGTGTAAACTAGGTAATATAACGCCCAACTTTTACCCAATTGGGTTTA

   CCCAAATACTGCCCGACCAATTCTTTTATACCCAACTTTAACGGTGGTGATAGGTGTGATTATGTTGTTGATT

   ATATCTAGCCTTCTCGTGTGCAGACTTGAGGTCAGTGTAGTGCTAGAGCATACGGCAGAAGACGAAC 5’

                                                  CGTATGCCGTCTTCTGCTTG•••

Fragment containing the first cDNA strand, used for the second round of sequencing:

3’ GTTCGTCTTCTGCCGTATGCTCTAGCACTACACTGACCTCAAGTCTGCACACGAGAAGGCTAGATATAATCAA

   CAACATAATCACACCTATCACCACCGTTAAAGTTGGGTATAAAAGAATTGGTCGGGCAGTATTTGGGTAAACC

   CAATTGGGTAAAAGTTGGGCGTTATATTACCTAGTTTACACTAAAGTATAAAGATAACCTAAGGAGGACGATC

   GGATTCGCTTCTAGCCTTCTCGCAGCACATCCCTTTCTCACATCTAGAGCCACCAGCGGCATAGTAA 5’

                                                  TCGGTGGTCGCCGTATCATT•••

Sequencing

Illumina TruSeq sequencing primers

Read 1 primer: identical to the last (3′-most) 33 bases of the universal adaptor:

5’ ACACTCTTTCCCTACACGACGCTCTTCCGATCT 3’

This primer anneals to the second cDNA strand and allows to synthesize a strand that is sense relative to the first cDNA strand. Read 1 will therefore be antisense relative to the initial mRNA molecule.

Read 2 primer: identical to the last (3′-most) 33 bases of the indexed adaptor, followed by a thymidine:

5’ GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT 3’

This primer anneals to the first cDNA strand and allows to synthesize a strand that is sens relative to the second cDNA strand. Read 2 will therefore be sense relative to the initial mRNA molecule.

Index read primer: complementary to the first (5′-most) 33 bases of the indexed adaptor:

5’ GATCGGAAGAGCACACGTCTGAACTCCAGTCAC 3’

To sum up, the 3 sequencing primers anneal to the cDNA fragment in the following way:

5’ AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCTTCGCTTAGGCTAGCA…

3’ TTACTATGCCGCTGGTGGCTCTAGATGTGAGAAAGGGATGTGCTGCGAGAAGGCTAGAAGCGAATCCGATCGT…

                         5’ ACACTCTTTCCCTACACGACGCTCTTCCGATCT 3’ →

  ← 3’ TCTAGCCTTCTCGTGTGCAGACTTGAGGTCAGTG 5’

   …TATAGATCGGAAGAGCACACGTCTGAACTCCAGTCACATCACGATCTCGTATGCCGTCTTCTGCTTG 3’

   …ATATCTAGCCTTCTCGTGTGCAGACTTGAGGTCAGTGTAGTGCTAGAGCATACGGCAGAAGACGAAC 5’

     5’ GATCGGAAGAGCACACGTCTGAACTCCAGTCAC 3’ →

First round of sequencing

Read 1 primer anneals to the universal adapter and the sequencing reaction is performed for a specific number of rounds (here, 100 rounds for a final 100-nt read). After the generation of read 1 (green letters: ATCG), the newly synthesized strand is discarded.

3’ TTACTATGCCGCTGGTGGCTCTAGATGTGAGAAAGGGATGTGCTGCGAGAAGGCTAGAAGCGAATCCGATCGT

                         5’ ACACTCTTTCCCTACACGACGCTCTTCCGATCTTCGCTTAGGCTAGCA

   CCTCCTTAGGTTATCTTTATACTTTAGTGTAAACTAGGTAATATAACGCCCAACTTTTACCCAATTGGGTTTA

   GGAGGAATCCAATAGAAATATGAAATCACATTTGATCCATTATATTGCGGGTTGAAAATGGGTTAACCCAAAT

   CCCAAATACTGCCCGACCAATTCTTTTATACCCAACTTTAACGGTGGTGATAGGTGTGATTATGTTGTTGATT

   GGGTTTATGACG 3’

   ATATCTAGCCTTCTCGTGTGCAGACTTGAGGTCAGTGTAGTGCTAGAGCATACGGCAGAAGACGAAC 5’

                                                  CGTATGCCGTCTTCTGCTTG•••

This is how read 1 sequence will appear in the FASTQ file:

TCGCTTAGGCTAGCAGGAGGAATCCAATAGAAATATGAAATCACATTTGATCCATTATATTGCGGGTTGAAAATGG

GTTAACCCAAATGGGTTTATGACG

If the insert is too short, read 1 may span over the indexed adaptor sequence. Therefore, the right end of read 1 sequences must be trimmed to remove the following sequence:

AGATCGGAAGAGCACACGTCTGAACTCCAGTCACNNNNNNATCTCGTATGCCGTCTTCTGCTTG

Index sequencing

If multiplexing is used in the sequencing experiment, the index read primer is annealed to the same second cDNA strand-containing fragment. A short sequencing reaction is performed to get the index sequence (yellow letters ATCACG). The newly synthesized strand is then discarded.

3’ TTACTATGCCGCTGGTGGCTCTAGATGTGAGAAAGGGATGTGCTGCGAGAAGGCTAGAAGCGAATCCGATCGT

   CCTCCTTAGGTTATCTTTATACTTTAGTGTAAACTAGGTAATATAACGCCCAACTTTTACCCAATTGGGTTTA

   CCCAAATACTGCCCGACCAATTCTTTTATACCCAACTTTAACGGTGGTGATAGGTGTGATTATGTTGTTGATT

   ATATCTAGCCTTCTCGTGTGCAGACTTGAGGTCAGTGTAGTGCTAGAGCATACGGCAGAAGACGAAC 5’

    5’ GATCGGAAGAGCACACGTCTGAACTCCAGTCACATCACG    CGTATGCCGTCTTCTGCTTG•••

Second round of sequencing

If paired-end sequencing is performed, read 2 primer is annealed to the indexed adapter on fragment containing the first cDNA strand. Again, the sequencing reaction is performed for a specific number of rounds (here, 100 rounds for a final 100-nt read). After the generation of read 2 (green letters: ATCG), the newly synthesized strand is discarded.

3’ GTTCGTCTTCTGCCGTATGCTCTAGCACTACACTGACCTCAAGTCTGCACACGAGAAGGCTAGATATAATCAA

                              5’ GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCTATATTAGTT

   CAACATAATCACACCTATCACCACCGTTAAAGTTGGGTATAAAAGAATTGGTCGGGCAGTATTTGGGTAAACC

   GTTGTATTAGTGTGGATAGTGGTGGCAATTTCAACCCATATTTTCTTAACCAGCCCGTCATAAACCCATTTGG

   CAATTGGGTAAAAGTTGGGCGTTATATTACCTAGTTTACACTAAAGTATAAAGATAACCTAAGGAGGACGATC

   GTTAACCCATTTTCAACC 3’

   GGATTCGCTTCTAGCCTTCTCGCAGCACATCCCTTTCTCACATCTAGAGCCACCAGCGGCATAGTAA 5’

                                                  TCGGTGGTCGCCGTATCATT•••

This is how read 1 sequence will appear in the FASTQ file:

ATATTAGTTGTTGTATTAGTGTGGATAGTGGTGGCAATTTCAACCCATATTTTCTTAACCAGCCCGTCATAAACCC

ATTTGGGTTAACCCATTTTCAACC

Again, if the insert is too short, read 2 may span over the universal adaptor sequence. Therefore, the right end of read 2 sequences must be trimmed to remove the following sequence:

AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT

Conclusion

The sequencing reads 1 and 2 are then arranged as follows:

5’ GAACUGAUAAAGAGGAAACUAAAGCCACCCCAGAGUAUUUCUGAUUGCAGCCAUUGGUGCCUGCCUGGAAUGCCC

   GAUACAUGGAAUAGGUUACUAUAUGCAUCUCUGCUUUUGGAUCACCCCAUUGAUUCCCCUAGUCUAUCACAUUUG

   GUCUCUUGAAAAUCAGAGAUUUACCAUCAGUAGGCUAUAUUAGUUGUUGUAUUAGUGUGGAUAGUGGUGGCAAUU

                              Read 2 → ATATTAGTTGTTGTATTAGTGTGGATAGTGGTGGCAATT

   UCAACCCAUAUUUUCUUAACCAGCCCGUCAUAAACCCAUUUGGGUUAACCCAUUUUCAACCCGCAAUAUAAUGGA

   TCAACCCATATTTTCTTAACCAGCCCGTCATAAACCCATTTGGGTTAACCCATTTTCAACC

                            CGTCATAAACCCATTTGGGTTAACCCATTTTCAACCCGCAATATAATGGA

   UCAAAUGUGAUUUCAUAUUUCUAUUGGAUUCCUCCUGCUAGCCUAAGCGAGAUAUCUACUGAGGGAACUUUUGAA

   TCAAATGTGATTTCATATTTCTATTGGATTCCTCCTGCTAGCCTAAGCGA ← Read 1

   CUUUCAUAUGAGUGUAUAAACGUGUAUGUUGAUUGUUAAAAAAAAAAAAAAAAAA 3’

Reads 1 and 2 are antisense and sense relative to the initial mRNA, respectively. Therefore, the library orientation is “reverse-forward”. This is why, for instance, option --SS_lib_type RF must be used when performing a de novo assembly with the Trinity assembler.