DNA sequencing is used in more and more applications like forensics and medicine. You might have heard about it in the news. In this article i will show you all the different technologies used to sequence a DNA molecule and how they differ. The field of genetics is becoming increasingly important and knowing enough about how it works is certainly a benefit.
Let’s start with a simple introduction into DNA. DNA is essentially code, made up of four bases, A, T G and C. Those four letters represent four molecules that make up a chain. Depending on the sequence you might have a different gene, different information. For example ATTTGCCA is different from CGGATTCA despite the fact that they are both made up of eight bases. This code plays an important role in the cell because when it is read it can produce proteins and enzymes and do a lot. There are tons of regulatory elements (proteins) that also play a role in this “code” but” sequencing in general refers to the series of nucleotides we call DNA, not the epigenetic mechanisms that control it.
We need sequencing to answer some questions. For example how does this gene i have differ from the one you have or more importantly where is that difference that makes you stronger than me or healthier. This ability is incredibly useful for researchers. Thankfully sequencing has become a lot more economical in the past few years and has made genetics grow a lot faster.
Frederick Sanger was the one who first developed this technique. Like most sequencing methods it is based on PCR reactions. It works by replicating a DNA sample again and again for many cycles. In the reaction there are terminating fluorescent dideoxyribonucleotides along with normal nucleotides. Each fluorescent dideoxyribonucleotide has a different colour like A – green, C – blue, T – red, G – orange. As strands are copied and new ones are synthesized by the “floating” nucleotides, eventually one dideoxyribonucleotide will be incorporated and the chain will terminate prematurely. This will happen at random points each time. You will end up with tons of uncomplete chains. Imagine an eight nucleotide chain, ATGCATGC. After many cycles you will end up with some chains being, A another one being AT another one ATG some other chains ATGC etc. If you run all those through a gel, they will line up in the gel based on their weight. Now if you put a laser and a sensor to detect the light from each terminating dideoxyribonucleotide you can know the last nucleotide of each chain. So you will have the first then the second then the third nucleotide etc. Lined up like that because of their weight on the gel. I hope it makes sense if not i will link to a video with some visual explanations.
Next generation sequencing.
Next generation sequencing is also known as high throughput sequencing and compared to Sanger sequencing or other previously used techniques it had various benefits. Those techniques allowed for cheaper and longer read of thousands of bases at a time. It works with more sophisticated machines but it ends up being cheaper in general. First the DNA has to be cut in short sections depending on the machine. It is inserted along with reagends on a chip with a glass floor surface that has nucleotides attached to it. The DNA binds to those nucleotides and the fluorescent nucleotides bind to the strand one at a time. The cycle is controlled by the machine here, and the reactions continues only if it has to. A fluorescent signal based on which base (A,T,G,C) is added is emitted every time a base is added to the sample strands that have being attached to the glass. This light is recorded and then the machine adds the second nucleotide by changing the reading thus removing the terminator from the fluorescent nucleotides. Here there are issues with noise. When you have multiple molecules emitting light it is hard to distinguish the one you want and the specific light colour it emits based on the base it has incorporated at that cycle. There are similar methods using beads to attach the DNA molecules from the sample and others that add multiple nucleotides at once. Each has benefits and drawbacks that will benefit the researcher depending on the project. Technologies like this are used currently and many companies like Illumina develop their own new patents frequently. At the end all those sequences are put in a series. How does someone know though which franment goes first and last? All those fragments made at the beginning have overlapping sequences that are used to align them at the end.
Other techniques considered next generation sequencing like Ion Torrent semiconductor sequencing use chips that detect hydrogen ions that are released during the binding of a nucleotide on a strand. Each type of nucleotides is added one by one again and again and the washed. If a change occurs then you know the As you added for example did bind to the first or second or any spot on the chain.
Other techniques use complex chips or even graphene to detect which molecules exist in which order on a strand. Companies are trying to develop microfluidic chips that can sequence whole genomes fast and without costing a lot.
Tunnels or pores that DNA samples pass through are being developed. This technology relies on the currents that DNA molecules produce in a solution. Those are different based on the sequence of each strand. If you detect them, theoretically you can know the DNA sequence. Other methods include mass spectrometry and microfluidic chips with multiple known samples attached on a glass bed. There are tons of ways sequencing can be improved and a lot of potential profits from it so a lot of companies try to develop the best solution for everyone.
Third generation sequencing.
Third generation sequencing works by reading the DNA molecules in the sample as they are, without needing for fragmentation or amplification. It can produce longer reads and some sequencers are as small as a usb stick. Most of the techniques are very similar to those we saw above for the second generation sequencing. The difference is mainly in the fact that longer read are produced from fewer molecules. Companies have attempted to use pores, that have an electrical field surrounding them. This field changes according to the specifications of the molecule passing through them. Others use beds with nucleotides and DNA polymerase. The problem here is with noise. Like second generation sequencing there can be noise when you attempt to read that many bases at once. Third generation sequencing has a lot of issues with noise thus a decreased accuracy.
Some argue that it shouldn’t be considered a third generation since it is so similar to the second generation technologies.
Epigenetics involves everything on the DNA and around it, like proteins that modulates it’s expression. There is something called Methylation and it’s basically a methyl group on the DNA and its associated histones (proteins). This is very important and to sequence that there are a couple more complicated techniques. There are technologies based on next generation sequencing that can detect methylation. Methylation occurs at CpG sites on the DNA. Those are segments of DNA that have a C (cytosine) follower by a G (Guanine), with a phosphate bond in between. The C is methylated (or not) and thus a gene is expressed more or less depending on how much methylation there is in it’s CpG sites.
There is a way to determine histone methylation and to measure other epigenetic markers too but those are not really “DNA sequencing”.
More specific methods.
If you only need to know information about a very small piece of DNA then there are kits, with reagents that allow you to detect that. Those usually contain small premade specific sequences of DNA that when they bind to their complementary sequence emit a specific wavelength. So if you want you can order a specific sequence, then take your sample, break the two DNA strands, add the small sequence you ordered and if it matches to any sequence found in your sample you get light, if not you don’t. Sometimes you get kits that can detect multiple variations of a sequence and emit a different colour accordingly. Those short sequences are made chemically with very complex processes in a lab and you order them. It’s not practical to attempt to make them.
I know i could have made it a lot simpler and i do not explain the basics. But if you need to understand more than i am describing here you should know how DNA works and the basics of genetics. I wanted to put all those details in one place for students and even profesionales who might need a short revision on what’s up with DNA sequencing. I cannot link to all my sources for this article as i usually do but i will give you a couple of useful ones to check. And if you want more science and tech news follow Qul Mind on Facebook and Twitter.