DNA sequencing is the determination of the exact order in which 3 billion chemical building blocks, that constitute a DNA, line up in its single molecule, out of the 24 different human chromosomes. These blocks are called nucleotide bases which are adenine, guanine, cytosine and thymine and are abbreviated as A, G, C, and T, respectively.
Sequencing has brought to light the nature of approximately 25,000 genes within a DNA structure and also the regions controlling these bases. In other words, it is the process of determining the exact sequence of nucleotides in the sample. The kind of information carried in a particular segment of DNA, can be deduced through these sequences.
For example, a mutation gene that causes a disease, can be analyzed and located exactly to a particular part of the DNA, by using this method. This development in DNA research has led to a prudent and a more educated perspective towards the functioning of the body's mechanism.
Frederick Sanger, the inventor of the 'dideoxy method' or 'Sanger method' of DNA sequencing, was awarded the Nobel prize for this incredible achievement. This dye-based sequencing has led to a faster process by automated analysis and an easier method to map the structure.
Scientific data sharing and joint research across continents, has led to the generation of many plant, animal, and microbial genomes. The earlier methods were labor-oriented and involved a lot of money.
A rough idea about the development of the project can be obtained from a simple figure; 200 million base pairs were sequenced in the year 1998 whereas in January 2003, the US Department of Energy's Joint Genome Institute alone sequenced 1.5 billion nucleotides bases.
Mapping involves identifying the set of clones that span the region of genome to be sequenced.
Creation of a Library
This involves making a smaller set of clones from the existing mapped clones. There are 50 million to 250 million bases of chromosomes, which are broken down into smaller pieces. Each of the smaller pieces is then used as a template, which helps in generating fragments of varying length (differing by a single base).
Preparation of the template
A set of smaller clones is used to 'purify' DNA, to set up and perform sequencing.
The fragments in a set are separated by gel electrophoresis. It helps determine sequences from smaller clones. New fluorescent dyes facilitate separation of all the four bases in a single file on the gel.
The original sequence for each of the small pieces generated in the first step, is recreated. The electropherograms are analyzed for a four-colored chromatogram, representing the four bases as peaks in the graph. This is done automatically by the sequence analyzers as a part of the finishing stage.
Annotation and Verification
Computers assemble the short sequences in block data of 500 bases, each referred to as 'read length'. The long continuous stretches are further scrutinized for anomalies, gene-coding regions, and other features. The completed sequences are stored in public databases like GenBank. The entire data is available for reference anywhere in the world.
The DNA sequences of 2003 were only a rough estimation of the human chromosomes. It underwent a number of modifications and additions in order to clearly define each chromosome. Small genomes such as that of viruses and bacteria were mapped initially, followed by major advances such as, human genome mapping or human gene therapy.