Introduction
This post aims to explain how Illumina paired-end sequencing reads are produced with the TruSeq Stranded mRNA kit. Information was taken from the Sample Preparation Guide and the Reference Guide provided by Illumina, as well as from this well-detailed document from Tufts University. The great RNA-seqlopedia edited by researchers from the University of Oregon is another gold mine of information for those interested in all aspects of RNA-seq technologies.
All oligonucleotide sequences given in this post (primers, adapters, flow cell-bound oligos) are proprietary to Illumina. They are subject to the following legal copyright notice:
Oligonucleotide sequences © 2018 Illumina, Inc. All rights reserved. Derivative works created by Illumina customers are authorized for use with Illumina instruments and products only. All other uses are strictly prohibited.
RNA cleavage and priming
Consider a fictional mRNA having the following sequence:
5’ GAACUGAUAAAGAGGAAACUAAAGCCACCCCAGAGUAUUUCUGAUUGCAGCCAUUGGUGCCUGCCUGGAAUGCCC
GAUACAUGGAAUAGGUUACUAUAUGCAUCUCUGCUUUUGGAUCACCCCAUUGAUUCCCCUAGUCUAUCACAUUUG
GUCUCUUGAAAAUCAGAGAUUUACCAUCAGUAGGCUAUAUUAGUUGUUGUAUUAGUGUGGAUAGUGGUGGCAAUU
UCAACCCAUAUUUUCUUAACCAGCCCGUCAUAAACCCAUUUGGGUUAACCCAUUUUCAACCCGCAAUAUAAUGGA
UCAAAUGUGAUUUCAUAUUUCUAUUGGAUUCCUCCUGCUAGCCUAAGCGAGAUAUCUACUGAGGGAACUUUUGAA
CUUUCAUAUGAGUGUAUAAACGUGUAUGUUGAUUGUUAAAAAAAAAAAAAAAAAA 3’
RNAs are first heated (8 min at 94 °C) to generate fragments of random sizes. Let’s work with the following 200-bp fragment in the rest of this post:
5’ CAGAGAUUUACCAUCAGUAGGCUAUAUUAGUUGUUGUAUUAGUGUGGAUAGUGGUGGCAAUUUCAACCCAUAUUU
UCUUAACCAGCCCGUCAUAAACCCAUUUGGGUUAACCCAUUUUCAACCCGCAAUAUAAUGGAUCAAAUGUGAUUU
CAUAUUUCUAUUGGAUUCCUCCUGCUAGCCUAAGCGAGAUAUCUACUGAG 3’
mRNA fragments are hybridized to a mix of random hexamers, which will be used as primers for the cDNA first strand synthesis:
5’ CAGAGAUUUACCAUCAGUAGGCUAUAUUAGUUGUUGUAUUAGUGUGGAUAGUGGUGGCAAUUUCAACCCAUAUUU
UCUUAACCAGCCCGUCAUAAACCCAUUUGGGUUAACCCAUUUUCAACCCGCAAUAUAAUGGAUCAAAUGUGAUUU
CAUAUUUCUAUUGGAUUCCUCCUGCUAGCCUAAGCGAGAUAUCUACUGAG 3’
← 3’ TTCGCT 5’
When sequencing mRNAs, random primers allow to get a more uniform distribution of fragments along RNA molecules. Using oligo-dT primers would obviously lead to an over-representation of polyA-containing fragments stemming from 3′ ends of mRNAs. Even if fragmentation were performed after oligo-dT-primed cDNA synthesis, we would still get a 3′-enriched distribution of fragments, due to the low processivity of reverse transcriptases.
Reverse transcription
First strand synthesis
The SuperScriptII reverse transcriptase is used to synthesize the first cDNA strand, which is complementary to the initial mRNA. Actinomycin D is included to prevent DNA-dependent DNA synthesis.
5’ CAGAGAUUUACCAUCAGUAGGCUAUAUUAGUUGUUGUAUUAGUGUGGAUAGUGGUGGCAAUUUCAACCCAUAUUU
3’ GTCTCTAAATGGTAGTCATCCGATATAATCAACAACATAATCACACCTATCACCACCGTTAAAGTTGGGTATAAA
UCUUAACCAGCCCGUCAUAAACCCAUUUGGGUUAACCCAUUUUCAACCCGCAAUAUAAUGGAUCAAAUGUGAUUU
AGAATTGGTCGGGCAGTATTTGGGTAAACCCAATTGGGTAAAAGTTGGGCGTTATATTACCTAGTTTACACTAAA
CAUAUUUCUAUUGGAUUCCUCCUGCUAGCCUAAGCGAGAUAUCUACUGAG 3’
GTATAAAGATAACCTAAGGAGGACGATCGGATTCGCT 5’
Second strand synthesis
RNase H is used to nick the template mRNA strand by hydrolyzing phosphodiester bonds randomly along the RNA-DNA duplex. This produces new 3′-OH ends available to prime synthesis of the second strand.
5’ CAGAGAUUUACCAUCAGUAGGCU AUAUUAGUUGUUGUAUUAGUGUGGAUAGUGGU GGCAAUUUCAACCCAUA
3’ GTCTCTAAATGGTAGTCATCCGA TATAATCAACAACATAATCACACCTATCACCA CCGTTAAAGTTGGGTAT
UUUUCU UAACCAGCCCGUCAUAAACCCAUUUGGGUU AACCCAUUUUCAACCCGCAAUAUAAUGGAUCAAAUG
AAAAGA ATTGGTCGGGCAGTATTTGGGTAAACCCAA TTGGGTAAAAGTTGGGCGTTATATTACCTAGTTTAC
UGAUUUCAUAUUU CUAUUGGAUUCCUCCUGCUAGCCUAAGCGAGAUAUCUACUGAG 3’
ACTAAAGTATAAA GATAACCTAAGGAGGACGATCGGATTCGCT 5’
DNA polymerase I synthesizes complementary DNA fragments, starting at each 3′-OH end (5′ → 3′ polymerase activity) while degrading the RNA as it moves forward (5′ → 3′ exonuclease activity). Importantly, the nucleotide mix used for second strand synthesis includes dUTP instead of dTTP. This is will be used later to ensure the “strandedness” of the library. After this step, the initial 5′ mRNA remains bound to the cDNA.
5’ CAGAGAUUUACCAUCAGUAGGCU AUAUUAGUUGUUGUAUUAGUGUGGAUAGUGGU GGCAAUUUCAACCCAUA
3’ GTCTCTAAATGGTAGTCATCCGA TATAATCAACAACATAATCACACCTATCACCA CCGTTAAAGTTGGGTAT
UUUUCU UAACCAGCCCGUCAUAAACCCAUUUGGGUU AACCCAUUUUCAACCCGCAAUAUAAUGGAUCAAAUG
AAAAGA ATTGGTCGGGCAGTATTTGGGTAAACCCAA TTGGGTAAAAGTTGGGCGTTATATTACCTAGTTTAC
UGAUUUCAUAUUU CUAUUGGAUUCCUCCUGCUAGCCUAAGCGAGAUAUCUACUGAG 3’
ACTAAAGTATAAA GATAACCTAAGGAGGACGATCGGATTCGCT 5’
DNA ligase then reestablishes phosphodiester bonds between second strand cDNA fragments.
5’ CAGAGAUUUACCAUCAGUAGGCU AUAUUAGUUGUUGUAUUAGUGUGGAUAGUGGUGGCAAUUUCAACCCAUAU
3’ GTCTCTAAATGGTAGTCATCCGA TATAATCAACAACATAATCACACCTATCACCACCGTTAAAGTTGGGTATA
UUUCUUAACCAGCCCGUCAUAAACCCAUUUGGGUUAACCCAUUUUCAACCCGCAAUAUAAUGGAUCAAAUGUGA
AAAGAATTGGTCGGGCAGTATTTGGGTAAACCCAATTGGGTAAAAGTTGGGCGTTATATTACCTAGTTTACACT
UUUCAUAUUUCUAUUGGAUUCCUCCUGCUAGCCUAAGCGA 3’
AAAGTATAAAGATAACCTAAGGAGGACGATCGGATTCGCT 5’
The cDNA then undergoes a blunting step, after which the RNA fragment in 5′ is lost.
5’ AUAUUAGUUGUUGUAUUAGUGUGGAUAGUGGUGGCAAUUUCAACCCAUAUUUUCUUAACCAGCCCGUCAUAAAC
3’ TATAATCAACAACATAATCACACCTATCACCACCGTTAAAGTTGGGTATAAAAGAATTGGTCGGGCAGTATTTG
CCAUUUGGGUUAACCCAUUUUCAACCCGCAAUAUAAUGGAUCAAAUGUGAUUUCAUAUUUCUAUUGGAUUCCUC
GGTAAACCCAATTGGGTAAAAGTTGGGCGTTATATTACCTAGTTTACACTAAAGTATAAAGATAACCTAAGGAG
CUGCUAGCCUAAGCGA 3’
GACGATCGGATTCGCT 5’
If we rotate the cDNA by 180°, we get:
5’ TCGCTTAGGCTAGCAGGAGGAATCCAATAGAAATATGAAATCACATTTGATCCATTATATTGCGGGTTGAAAAT
3’ AGCGAAUCCGAUCGUCCUCCUUAGGUUAUCUUUAUACUUUAGUGUAAACUAGGUAAUAUAACGCCCAACUUUUA
GGGTTAACCCAAATGGGTTTATGACGGGCTGGTTAAGAAAATATGGGTTGAAATTGCCACCACTATCCACACTA
CCCAAUUGGGUUUACCCAAAUACUGCCCGACCAAUUCUUUUAUACCCAACUUUAACGGUGGUGAUAGGUGUGAU
ATACAACAACTAATAT 3’
UAUGUUGUUGAUUAUA 5’
Adapter ligation
3′ adenylation
The cDNA 3′ ends are adenylated to facilitate subsequent adapter ligation:
5’ TCGCTTAGGCTAGCAGGAGGAATCCAATAGAAATATGAAATCACATTTGATCCATTATATTGCGGGTTGAAAAT
3’ AAGCGAAUCCGAUCGUCCUCCUUAGGUUAUCUUUAUACUUUAGUGUAAACUAGGUAAUAUAACGCCCAACUUUUA
GGGTTAACCCAAATGGGTTTATGACGGGCTGGTTAAGAAAATATGGGTTGAAATTGCCACCACTATCCACACTA
CCCAAUUGGGUUUACCCAAAUACUGCCCGACCAAUUCUUUUAUACCCAACUUUAACGGUGGUGAUAGGUGUGAU
ATACAACAACTAATATA 3’
UAUGUUGUUGAUUAUA 5’
Sequences of Illumina TruSeq adapters
TruSeq Universal Adapter
5’ AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATC*T 3’
The part underlined will bind to complementary oligos on the flow cell. The asterisk (*) represents a phosphorothioate bond, which protects the final thymidine against exonucleasic degradation. This is particularly important, since this thymidine is the one that will hybridize to the adenosine we have just added to the cDNA 3′ ends.
TruSeq Index Adapter (here with index 1)
5’ P-GATCGGAAGAGCACACGTCTGAACTCCAGTCACATCACGATCTCGTATGCCGTCTTCTGCTTG 3’
The part underlined will bind to complementary oligos on the flow cell. The P- represents a phosphate group added to the adapter’s 5′ end to allow ligation to the extra 3′ adenosines on the cDNA. The yellow region in the multiplexing index, a “barcode” which allows to specifically tag fragments from different libraries that will be sequenced at the same time. There are 24 indexes:
Index 1 |
ATCACG |
Index 7 |
CAGATC |
Index 13 |
AGTCAACA |
Index 20 |
GTGGCCTT |
Index 2 |
CGATGT |
Index 8 |
ACTTGA |
Index 14 |
AGTTCCGT |
Index 21 |
GTTTCGGA |
Index 3 |
TTAGGC |
Index 9 |
GATCAG |
Index 15 |
ATGTCAGA |
Index 22 |
CGTACGTA |
Index 4 |
TGACCA |
Index 10 |
TAGCTT |
Index 16 |
CCGTCCCG |
Index 23 |
GAGTGGAT |
Index 5 |
ACAGTG |
Index 11 |
GGCTAC |
Index 18 |
GTCCGCAC |
Index 25 |
ACTGATAT |
Index 6 |
GCCAAT |
Index 12 |
CTTGTA |
Index 19 |
GTGAAACG |
Index 27 |
ATTCCTTT |
Formation of “Y-shaped” adapter complexes
Both adapters are mixed in equal amounts and denaturated at 95 °C. The temperature is then lowered to 70 °C in order for the adapters to anneal to each other along a short 12-nt terminal region. This forms “Y-shaped” complexes that will be ligated to both extremities of all cDNA fragments.
5’ AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATC*T 3’
············
3’ GTTCGTCTTCTGCCGTATGCTCTAGCACTACACTGACCTCAAGTCTGCACACGAGAAGGCTAG-P 5’
After the ligation, the “left end” of the cDNA (5′ end of the first strand) looks like this:
5’ AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCTTCGCTTAG… 3’
······················
3’ GTTCGTCTTCTGCCGTATGCTCTAGCACTACACTGACCTCAAGTCTGCACACGAGAAGGCTAGAAGCGAAUC… 5’
The “right end” of the cDNA (3′ end of the first strand) looks like this:
5’ …ACTAATATAGATCGGAAGAGCACACGTCTGAACTCCAGTCACATCACGATCTCGTATGCCGTCTTCTGCTTG 3’
······················
3’ …UGAUUAUATCTAGCCTTCTCGCAGCACATCCCTTTCTCACATCTAGAGCCACCAGCGGCATAGTAA 5’
PCR enrichment
The library can now be amplified by PCR. Here, the TruSeq protocol involves a particular polymerase that stops right after meeting a U base on the template. Since the cDNA second strand was synthesized with Us instead of Ts, only the cDNA first strand can be used as a template for PCR, which makes the library stranded.
The primers used for this PCR are:
Primer 1: identical to the first (5′-most) 40 bases of the universal adapter.
5’ AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGA 3’
Primer 2: complementary to the last (3′-most) 24 bases of the indexed adapter.
5’ CAAGCAGAAGACGGCATACGAGAT 3’
First round of PCR
At the first round, only primer 2 can anneal to cDNA fragments, at the 3′ end of indexed adapters. The PCR begins along both strands (light gray letters: ATCG), but the presence of dUTPs in the second cDNA strand interrupts the reaction (*). In the end, only the first cDNA strand is completely amplified and available for subsequent PCR rounds.
3’ TTACTATGCCGCTGGTGGCTCTAGATGTGAGAAAGGGATGTGCTGCGAGAAGGCTAGAAGCGAATCC
5’ AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCTTCGCTTAGG
3’ GTTCGTCTTCTGCCGTATGCTCTAGCACTACACTGACCTCAAGTCTGCACACGAGAAGGCTAGAAGCGAAUCC
5’ CAAGCAGAAGACGGCATACGAGATCGTGATGTGACTGGAGTTCAGACGTGTGCTCTTCCGATCTATACTT*
GATCGTCCTCCTTAGGTTATCTTTATACTTTAGTGTAAACTAGGTAATATAACGCCCAACTTTTACCCAATTG
CTAGCAGGAGGAATCCAATAGAAATATGAAATCACATTTGATCCATTATATTGCGGGTTGAAAATGGGTTAAC
GAUCGUCCUCCUUAGGUUAUCUUUAUACUUUAGUGUAAACUAGGUAAUAUAACGCCCAACUUUUACCCAAUUG
GGTTTACCCAAATACTGCCCGACCAATTCTTTTATACCCAACTTTAACGGTGGTGATAGGTGTGATTATGTTG
CCAAATGGGTTTATGACGGGCTGGTTAAGAAAATATGGGTTGAAATTGCCACCACTATCCACACTAATACAAC
GGUUUACCCAAAUACUGCCCGACCAAUUCUUUUAUACCCAACUUUAACGGUGGUGAUAGGUGUGAUUAUGUUG
TTGATTATATCTAGCCTTCTCGTGTGCAGACTTGAGGTCAGTGTAGTGCTAGAGCATACGGCAGAAGACGAAC 5’
AACTAATATAGATCGGAAGAGCACACGTCTGAACTCCAGTCACATCACGATCTCGTATGCCGTCTTCTGCTTG 3’
UUGAUUAUATCTAGCCTTCTCGCAGCACATCCCTTTCTCACATCTAGAGCCACCAGCGGCATAGTAA 5’
Subsequent rounds of PCR
Primer 1 can now anneal to the universal adapter. The number of cycles is controlled (~15) to avoid skewing relative abundance of transcripts in the library.
3’ TTACTATGCCGCTGGTGGCTCTAGATGTGAGAAAGGGATGTGCTGCGAGAAGGCTAGAAGCGAATCCGATCGT
5’ AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCTTCGCTTAGGCTAGCA
3’ TTACTATGCCGCTGGTGGCTCTAGATGTGAGAAAGGGATGTGCTGCGAGAAGGCTAGAAGCGAATCCGATCGT
5’ AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCTTCGCTTAGGCTAGCA
CCTCCTTAGGTTATCTTTATACTTTAGTGTAAACTAGGTAATATAACGCCCAACTTTTACCCAATTGGGTTTA
GGAGGAATCCAATAGAAATATGAAATCACATTTGATCCATTATATTGCGGGTTGAAAATGGGTTAACCCAAAT
CCTCCTTAGGTTATCTTTATACTTTAGTGTAAACTAGGTAATATAACGCCCAACTTTTACCCAATTGGGTTTA
GGAGGAATCCAATAGAAATATGAAATCACATTTGATCCATTATATTGCGGGTTGAAAATGGGTTAACCCAAAT
CCCAAATACTGCCCGACCAATTCTTTTATACCCAACTTTAACGGTGGTGATAGGTGTGATTATGTTGTTGATT
GGGTTTATGACGGGCTGGTTAAGAAAATATGGGTTGAAATTGCCACCACTATCCACACTAATACAACAACTAA
CCCAAATACTGCCCGACCAATTCTTTTATACCCAACTTTAACGGTGGTGATAGGTGTGATTATGTTGTTGATT
GGGTTTATGACGGGCTGGTTAAGAAAATATGGGTTGAAATTGCCACCACTATCCACACTAATACAACAACTAA
ATATCTAGCCTTCTCGTGTGCAGACTTGAGGTCAGTGTAGTGCTAGAGCATACGGCAGAAGACGAAC
TATAGATCGGAAGAGCACACGTCTGAACTCCAGTCACATCACGATCTCGTATGCCGTCTTCTGCTTG 3’
ATATCTAGCCTTCTCGTGTGCAGACTTGAGGTCAGTGTAGTGCTAGAGCATACGGCAGAAGACGAAC 5’
TATAGATCGGAAGAGCACACGTCTGAACTCCAGTCACATCACGATCTCGTATGCCGTCTTCTGCTTG 3’
In the end, cDNA fragments are oriented: the universal and the indexed adapters are located to the 5′-most and 3′-most regions of the first cDNA strand, respectively.
5’ AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCTTCGCTTAGGCTAGCA
3’ TTACTATGCCGCTGGTGGCTCTAGATGTGAGAAAGGGATGTGCTGCGAGAAGGCTAGAAGCGAATCCGATCGT
GGAGGAATCCAATAGAAATATGAAATCACATTTGATCCATTATATTGCGGGTTGAAAATGGGTTAACCCAAAT
CCTCCTTAGGTTATCTTTATACTTTAGTGTAAACTAGGTAATATAACGCCCAACTTTTACCCAATTGGGTTTA
GGGTTTATGACGGGCTGGTTAAGAAAATATGGGTTGAAATTGCCACCACTATCCACACTAATACAACAACTAA
CCCAAATACTGCCCGACCAATTCTTTTATACCCAACTTTAACGGTGGTGATAGGTGTGATTATGTTGTTGATT
TATAGATCGGAAGAGCACACGTCTGAACTCCAGTCACATCACGATCTCGTATGCCGTCTTCTGCTTG 3’
ATATCTAGCCTTCTCGTGTGCAGACTTGAGGTCAGTGTAGTGCTAGAGCATACGGCAGAAGACGAAC 5’
Binding to flow cell
The cDNA library can now be introduced in the Illumina flow cell, which is coated with two different kinds of oligos. The bullet points (•••) represent the bond to the flow cell.
Universal adapter oligo: complementary to the first (5′-most) 20 bases of the universal adaptor:
5’ TCGGTGGTCGCCGTATCATT••• 3’
Index adapter oligo: identical to the last (3′-most) 20 bases of the index adaptor.
5’ CGTATGCCGTCTTCTGCTTG••• 3’
At first, cDNAs are bound to the flow cell by both extremities, forming a horseshoe- or brige-like structure:
•••TTACTATGCCGCTGGTGGCT
5’ AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCTTCGCTTAGGCTAGCA
3’ TTACTATGCCGCTGGTGGCTCTAGATGTGAGAAAGGGATGTGCTGCGAGAAGGCTAGAAGCGAATCCGATCGT
GGAGGAATCCAATAGAAATATGAAATCACATTTGATCCATTATATTGCGGGTTGAAAATGGGTTAACCCAAAT
CCTCCTTAGGTTATCTTTATACTTTAGTGTAAACTAGGTAATATAACGCCCAACTTTTACCCAATTGGGTTTA
GGGTTTATGACGGGCTGGTTAAGAAAATATGGGTTGAAATTGCCACCACTATCCACACTAATACAACAACTAA
CCCAAATACTGCCCGACCAATTCTTTTATACCCAACTTTAACGGTGGTGATAGGTGTGATTATGTTGTTGATT
TATAGATCGGAAGAGCACACGTCTGAACTCCAGTCACATCACGATCTCGTATGCCGTCTTCTGCTTG 3’
ATATCTAGCCTTCTCGTGTGCAGACTTGAGGTCAGTGTAGTGCTAGAGCATACGGCAGAAGACGAAC 5’
CGTATGCCGTCTTCTGCTTG•••
Bridge PCR (not covered here) allows to form clusters of flow cell-bound cDNA molecules that correspond to the same initial mRNA fragment. After denaturation of cDNA double strands, bridge-like structures are linearized. Clusters are then made of a balanced mix of two types of fragments, used for the first and second round of sequencing, respectively.
Fragment containing the second cDNA strand, used for the first round of sequencing:
3’ TTACTATGCCGCTGGTGGCTCTAGATGTGAGAAAGGGATGTGCTGCGAGAAGGCTAGAAGCGAATCCGATCGT
CCTCCTTAGGTTATCTTTATACTTTAGTGTAAACTAGGTAATATAACGCCCAACTTTTACCCAATTGGGTTTA
CCCAAATACTGCCCGACCAATTCTTTTATACCCAACTTTAACGGTGGTGATAGGTGTGATTATGTTGTTGATT
ATATCTAGCCTTCTCGTGTGCAGACTTGAGGTCAGTGTAGTGCTAGAGCATACGGCAGAAGACGAAC 5’
CGTATGCCGTCTTCTGCTTG•••
Fragment containing the first cDNA strand, used for the second round of sequencing:
3’ GTTCGTCTTCTGCCGTATGCTCTAGCACTACACTGACCTCAAGTCTGCACACGAGAAGGCTAGATATAATCAA
CAACATAATCACACCTATCACCACCGTTAAAGTTGGGTATAAAAGAATTGGTCGGGCAGTATTTGGGTAAACC
CAATTGGGTAAAAGTTGGGCGTTATATTACCTAGTTTACACTAAAGTATAAAGATAACCTAAGGAGGACGATC
GGATTCGCTTCTAGCCTTCTCGCAGCACATCCCTTTCTCACATCTAGAGCCACCAGCGGCATAGTAA 5’
TCGGTGGTCGCCGTATCATT•••
Sequencing
Illumina TruSeq sequencing primers
Read 1 primer: identical to the last (3′-most) 33 bases of the universal adaptor:
5’ ACACTCTTTCCCTACACGACGCTCTTCCGATCT 3’
This primer anneals to the second cDNA strand and allows to synthesize a strand that is sense relative to the first cDNA strand. Read 1 will therefore be antisense relative to the initial mRNA molecule.
Read 2 primer: identical to the last (3′-most) 33 bases of the indexed adaptor, followed by a thymidine:
5’ GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT 3’
This primer anneals to the first cDNA strand and allows to synthesize a strand that is sens relative to the second cDNA strand. Read 2 will therefore be sense relative to the initial mRNA molecule.
Index read primer: complementary to the first (5′-most) 33 bases of the indexed adaptor:
5’ GATCGGAAGAGCACACGTCTGAACTCCAGTCAC 3’
To sum up, the 3 sequencing primers anneal to the cDNA fragment in the following way:
5’ AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCTTCGCTTAGGCTAGCA…
3’ TTACTATGCCGCTGGTGGCTCTAGATGTGAGAAAGGGATGTGCTGCGAGAAGGCTAGAAGCGAATCCGATCGT…
5’ ACACTCTTTCCCTACACGACGCTCTTCCGATCT 3’ →
← 3’ TCTAGCCTTCTCGTGTGCAGACTTGAGGTCAGTG 5’
…TATAGATCGGAAGAGCACACGTCTGAACTCCAGTCACATCACGATCTCGTATGCCGTCTTCTGCTTG 3’
…ATATCTAGCCTTCTCGTGTGCAGACTTGAGGTCAGTGTAGTGCTAGAGCATACGGCAGAAGACGAAC 5’
5’ GATCGGAAGAGCACACGTCTGAACTCCAGTCAC 3’ →
First round of sequencing
Read 1 primer anneals to the universal adapter and the sequencing reaction is performed for a specific number of rounds (here, 100 rounds for a final 100-nt read). After the generation of read 1 (green letters: ATCG), the newly synthesized strand is discarded.
3’ TTACTATGCCGCTGGTGGCTCTAGATGTGAGAAAGGGATGTGCTGCGAGAAGGCTAGAAGCGAATCCGATCGT
5’ ACACTCTTTCCCTACACGACGCTCTTCCGATCTTCGCTTAGGCTAGCA
CCTCCTTAGGTTATCTTTATACTTTAGTGTAAACTAGGTAATATAACGCCCAACTTTTACCCAATTGGGTTTA
GGAGGAATCCAATAGAAATATGAAATCACATTTGATCCATTATATTGCGGGTTGAAAATGGGTTAACCCAAAT
CCCAAATACTGCCCGACCAATTCTTTTATACCCAACTTTAACGGTGGTGATAGGTGTGATTATGTTGTTGATT
GGGTTTATGACG 3’
ATATCTAGCCTTCTCGTGTGCAGACTTGAGGTCAGTGTAGTGCTAGAGCATACGGCAGAAGACGAAC 5’
CGTATGCCGTCTTCTGCTTG•••
This is how read 1 sequence will appear in the FASTQ file:
TCGCTTAGGCTAGCAGGAGGAATCCAATAGAAATATGAAATCACATTTGATCCATTATATTGCGGGTTGAAAATGG
GTTAACCCAAATGGGTTTATGACG
If the insert is too short, read 1 may span over the indexed adaptor sequence. Therefore, the right end of read 1 sequences must be trimmed to remove the following sequence:
AGATCGGAAGAGCACACGTCTGAACTCCAGTCACNNNNNNATCTCGTATGCCGTCTTCTGCTTG
Index sequencing
If multiplexing is used in the sequencing experiment, the index read primer is annealed to the same second cDNA strand-containing fragment. A short sequencing reaction is performed to get the index sequence (yellow letters ATCACG). The newly synthesized strand is then discarded.
3’ TTACTATGCCGCTGGTGGCTCTAGATGTGAGAAAGGGATGTGCTGCGAGAAGGCTAGAAGCGAATCCGATCGT
CCTCCTTAGGTTATCTTTATACTTTAGTGTAAACTAGGTAATATAACGCCCAACTTTTACCCAATTGGGTTTA
CCCAAATACTGCCCGACCAATTCTTTTATACCCAACTTTAACGGTGGTGATAGGTGTGATTATGTTGTTGATT
ATATCTAGCCTTCTCGTGTGCAGACTTGAGGTCAGTGTAGTGCTAGAGCATACGGCAGAAGACGAAC 5’
5’ GATCGGAAGAGCACACGTCTGAACTCCAGTCACATCACG CGTATGCCGTCTTCTGCTTG•••
Second round of sequencing
If paired-end sequencing is performed, read 2 primer is annealed to the indexed adapter on fragment containing the first cDNA strand. Again, the sequencing reaction is performed for a specific number of rounds (here, 100 rounds for a final 100-nt read). After the generation of read 2 (green letters: ATCG), the newly synthesized strand is discarded.
3’ GTTCGTCTTCTGCCGTATGCTCTAGCACTACACTGACCTCAAGTCTGCACACGAGAAGGCTAGATATAATCAA
5’ GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCTATATTAGTT
CAACATAATCACACCTATCACCACCGTTAAAGTTGGGTATAAAAGAATTGGTCGGGCAGTATTTGGGTAAACC
GTTGTATTAGTGTGGATAGTGGTGGCAATTTCAACCCATATTTTCTTAACCAGCCCGTCATAAACCCATTTGG
CAATTGGGTAAAAGTTGGGCGTTATATTACCTAGTTTACACTAAAGTATAAAGATAACCTAAGGAGGACGATC
GTTAACCCATTTTCAACC 3’
GGATTCGCTTCTAGCCTTCTCGCAGCACATCCCTTTCTCACATCTAGAGCCACCAGCGGCATAGTAA 5’
TCGGTGGTCGCCGTATCATT•••
This is how read 1 sequence will appear in the FASTQ file:
ATATTAGTTGTTGTATTAGTGTGGATAGTGGTGGCAATTTCAACCCATATTTTCTTAACCAGCCCGTCATAAACCC
ATTTGGGTTAACCCATTTTCAACC
Again, if the insert is too short, read 2 may span over the universal adaptor sequence. Therefore, the right end of read 2 sequences must be trimmed to remove the following sequence:
AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT
Conclusion
The sequencing reads 1 and 2 are then arranged as follows:
5’ GAACUGAUAAAGAGGAAACUAAAGCCACCCCAGAGUAUUUCUGAUUGCAGCCAUUGGUGCCUGCCUGGAAUGCCC
GAUACAUGGAAUAGGUUACUAUAUGCAUCUCUGCUUUUGGAUCACCCCAUUGAUUCCCCUAGUCUAUCACAUUUG
GUCUCUUGAAAAUCAGAGAUUUACCAUCAGUAGGCUAUAUUAGUUGUUGUAUUAGUGUGGAUAGUGGUGGCAAUU
Read 2 → ATATTAGTTGTTGTATTAGTGTGGATAGTGGTGGCAATT
UCAACCCAUAUUUUCUUAACCAGCCCGUCAUAAACCCAUUUGGGUUAACCCAUUUUCAACCCGCAAUAUAAUGGA
TCAACCCATATTTTCTTAACCAGCCCGTCATAAACCCATTTGGGTTAACCCATTTTCAACC
CGTCATAAACCCATTTGGGTTAACCCATTTTCAACCCGCAATATAATGGA
UCAAAUGUGAUUUCAUAUUUCUAUUGGAUUCCUCCUGCUAGCCUAAGCGAGAUAUCUACUGAGGGAACUUUUGAA
TCAAATGTGATTTCATATTTCTATTGGATTCCTCCTGCTAGCCTAAGCGA ← Read 1
CUUUCAUAUGAGUGUAUAAACGUGUAUGUUGAUUGUUAAAAAAAAAAAAAAAAAA 3’
Reads 1 and 2 are antisense and sense relative to the initial mRNA, respectively. Therefore, the library orientation is “reverse-forward”. This is why, for instance, option --SS_lib_type RF must be used when performing a de novo assembly with the Trinity assembler.