Biotech: DNA Fingerprinting
Watching a kind of crime show in TV, and invariably producing they are going to start talking about DNA fingerprinting. Turns out a lot of people actually don't know what the heck DNA fingerprinting is. I remember, I had a friend in college who very confidently explained to somebody else it involved finding a fingerprint at a crime scene, using that finger print to grow a clone of the criminal so you could see what his face looked like. He was an English major. To make it easier on our English major friends, perhaps we should call it DNA testing.
Now, the fear has grown from the initial single technique that involved chopping up DNA, and looking at the pattern or pieces that you created, to a whole variety of techniques that have really revolutionized the field. The two major techniques are called RFLP and PCR. However the science behind them is actually pretty simple. Well, like I said, there is other forms. You don't need to worry about them because the AP Biology people, 20 years before they started asking questions about PCR, and that's the major technique being used in labs today.
So before I get in to what the heck is PCR or what is RFLP, I need to discuss a process called gel electrophoresis. Gel electrophoresis is a way of analyzing the DNA samples coming out of those two techniques. Once you get how to analyze the DNA, then I'll discuss RFLP, that first initial technique that involved chopping up patterns of DNA in order to identify people.
Last, I'll go into PCR. PCR is the major technique now being used in most labs. It involves copying sections of DNA, whatever sections you want. Which can be used for identification purposes, testing to see if somebody has a particular kind of gene, or even you can use it to do what's called gene sequencing, where you're identifying the specific bases in a segment of DNA.
So, most methods of DNA testing involve using gel electrophoresis. And that is a way of measuring the length of DNA that is being generated by these various forms of DNA testing. Both forms of DNA testing involve using gel electrophoresis. Like I've said before, to measure the length of the DNA molecules that are being generated by these methods. Now, gel electrophoresis depends on a chemical called Agarose which comes from seaweed.
Agarose, when you mix it with a watery buffer solution, it winds up forming a gel, kind of like Jell-O. Now, why would you do that? Well, this is something called a gel tray. What you do is you pour in the gel, and see this comb here? It winds up when the gel hardens, it creates little wells that you can then insert the DNA into. Why do that? Well, I can then take this gel casting tray, put it into this thing here called a gel electrophoresis box. When I put the DNA samples into the wells, and then close the box, I can then plug this in to a power supply. It'll run a current from here to there. And since DNA has a negative charge, it's attracted towards the positive or anode. The positive electrode over here. And that forces the DNA to move from the wells into the gel itself.
Now, turns out that agarose, when you make it properly, creates a regular latticework of the agarose polysaccharide molecule. And this creates a network of little tubes and channels that the DNA molecules have to weave and wind their way through. And it turns out that, the shorter the molecule of DNA, the faster it can go through the gel. Larger molecules wind up having to move at a slower pace. Now, to help you understand this, imagine you had to run a race in your classroom. But instead of just running one wall of the classroom, imagine you had to weave in and out of every single desk, all the way until finally you got to the opposite side of the classroom. You could do that fairly quickly.
What if you had to hold hands with say five people on a line? And then you start to try running your way through? Yes, individually each of you is pretty fast, but when you're all hooked together like that, you start taking sharp corners and getting hurt in places where you really don't want to get hurt. Same thing with DNA, the small pieces can weave their way through the channels quickly, the larger pieces have a harder time getting through.
Now, this can be extremely useful because, so what if you run a DNA sample? How do you know how big it is? Well, you run a known sample. Let's take a look at a very simple diagram of what a gel might look like.
Here we see a lane. Here's the well and I added in my DNA into this well. Now I know how big these pieces are, because I made them that big. And I can that see the smallest piece, it's only 1000 base pairs long, it zipped along pretty fast. This one here is 2500 base pairs long. And DNA is measured in pairs of bases from the double helix. This piece here is 5000 base pairs long, and this one here is 10,000 base pairs long. Into this lane, I loaded a sample of unknown DNA. Can you guess how long it is? That's right. It lines up exactly the same as that 5000 base pair a piece. Thus, I know it roughly has to be 5000 base pairs long as well.
What about this guy who doesn't line up precisely with any of these? How do I figure that out? That's pretty simple. Here is 5000, there is 2500. Just by using my eyeballs I can say, well it's somewhere between 2500 and 5000. So maybe 4000, 3500 base pairs long. Now, if I wanted to get complicated with this, I could take each of these, I could measure how many millimeters it measured, and put that on a graph creating a known curve and then use that to figure out my unknowns.
To get more practice on this, I strongly recommend you go to the online virtual lab for the AP Biology lab on DNA testing, and take a look.
Click on the part about analysing your results, two I believe it is. And they'll give you plenty of practice to go ahead and learn how to do this. It's one of the standard things that they'll have, either in the multiple choice or on the essay.
So, now that we've discussed gel electrophoresis, how to measure the lengths of DNA that have been created, let's take a look at the first of those two techniques I discussed, RFLP. Now, I had talked about RFLP but I hadn't explained what does the R, the F, the L, the P stand for. And this is one of the things that help you in science. Don't get confused or scared by lots of syllables. Just look at them. They tell you a lot about what's going on.
Restriction Fragment Length Polymorphism. Well, what does that mean? A restriction fragment is a piece of DNA, because I've talked about enzymes before. Well, there is a particular kind of enzyme called a restriction enzyme. And when we use those enzymes on DNA, they break a long piece of DNA, into smaller pieces or fragments. And the length of those pieces may be long, short, medium. Now the polymorphism, poly means many, morph means form. Well, because my DNA is different from your DNA, the length of the pattern of cut pieces that I create, will be different than the length of cut pieces that your DNA creates. And that's simply what Restriction Fragment Length Polymorphism refers to.
So let's take a closer look at how this might work. So, those restriction enzymes I mentioned briefly before, what they are is a special kind of enzyme that goes through DNA. And looks for specific sequences such as, GATC in this case. Now, just as a side note, sometimes you'll see restriction enzymes called endonucleases because they are inside the nucleus. But, despite everybody I know who is cool like me, calls them restriction enzymes.
So here we see a sample of DNA from two different individuals. Now because everybody's DNA is a little bit unique, at this gene, this section of the DNA, this person had a different sequence. Instead of GAATC, they had GTATC. Those restriction enzymes only cut at their specific sequence. So as it goes along sample one, it'll chop it right here. Notice how it creates these over hanging ends with this unpaired T, T and A there and this unpaired A A T in there. As a little heads up, this becomes very useful in genetic engineering. But I don't want to go there now. The same restriction enzyme however, going along this sequence of DNA, will go along and completely ignore that region and leave it uncut.
So, sample 1, sample 2. Person 1, person 2. We wind up creating two pieces here, one longer piece there. I then load sample 1 into the first well, I get two bands. And those two bands move fairly quickly. Why? Because they are short. While over here with sample 2 when I load it, it has a single band. And it's a much longer molecule of DNA, so it can't move as far as it tries to weave its way in through the Agarose.
So we have a band here close to the beginning of the gel, indicating one larger piece of DNA versus two smaller pieces of DNA. If I then had a third well and I loaded in a crime scene DNA sample, then I could see which pattern does it match.
Now, something that I've not talked about before is, how do you actually see the DNA? Because it turns out DNA is clear. Well, that's why you often have to add special stains such as ethidium bromide or methylene blue and there's many of them. Or special with RFLP. When you are cutting up a human DNA molecule or chromosomes, you wind up getting thousands of different bands.
So in that case a lot of times what they'll do is they'll add special radioactive probes. These are especially pre-made pieces of single stranded DNA, that will bind to particular sequences. So instead of seeing a whole of bands, you may only have to look at say, 10, 20, 30, depending on which probes you are using.
Now, what's one of the standard kind of questions that you may see on the AP Biology exam involving RFLP? What they'll do is they'll say, here's a map of a plasmid. A plasmid for those of you who don't know, is a circular molecule of DNA. Here's a gel where we've cut this plasmid with two different enzymes. I simply call them restriction enzyme A and restriction enzyme B. On this plasmid you have to figure out where are the restriction sites, the sequences of As, Ts, Cs and Gs that each enzyme is specific to. And what they do is they'll show you on this side, this is one of those molecular standards. This is where they've run a number of known molecules of DNA with known lengths. So let me add in their lengths here.
Now I'll tell you just from the get go that one of them is 1000 base pairs long. Another one is 10000 base pairs long. Which one would be the one? It would be the one that's winning the race, that's moving fastest. So this one here is 1000 base pairs long. I said this episode's nice and easy. This is two, this is three, this is four, this is five, this is six, this is seven, this is eight. I only meant 8000. So we can see here that it's gone through and our known standard has now allowed us to measure all of these other guys.
Now, in this first lane here, I added in our plasmid, our circular molecule of DNA, but I also added in that first enzyme, restriction enzyme A. How many pieces did it cut it into? Well, I can see two pieces.
So I know there must be two locations on my plasmid where there is the restriction site for enzyme A. So let's see, how long did they move? I can line this up and I can see that, one of my pieces is 3000 base pairs long. The other one is 5000 base pairs long. If I count going around, if this is 0, 1, 2, 3, 4, 5, 6, 7, 8. I can see it's one all the way around. Where can I put another cutting site for enzyme A? Well, I know that the total all the way around is 8, so three plus five is eight. So it must be, 1000, 2000, 3000, here is my cutting site for enzyme A.
What about enzyme B? How many cutting sites does it have? Well I take a look and I see there is only one band. If I take a circle and I cut it once, I only get one piece. Do I know how far away it is? I just know that somewhere in my 8000 base pair long circle, there is a cutting site. So I don't have much data now. What happened to the DNA when I cut it with both enzymes in the exact same test tube? Well, I wound up getting a piece that's 1000 base pairs long, another piece that is 2000 base pairs long. And then a third piece that's 5000 base pairs long. And this is where your logic skills kick in.
Let's see, well 5000, I saw originally with my A,it created two pieces. One that was 5000 one that was 3000. That 5000 base pair piece seems to be still intact. Which tells me, in this side, 1, 2, 3, 4, 5, there is not that B cutting site. That tells me it must be somewhere around in here. So where could it be? Well, it's 1000 base pairs away from one of the As, and 2000 base pairs away from the other one.
So I could actually put it in two different locations. I could put it here, which would work, or I could put it here, which would work. I'll go ahead and I'll put it right here, and I'll write B. Let me double check this.
If I have A and B, I wind up cutting here, that's 1000 base pairs long. I also wind up cutting here, that's a 2000 base pair long sequence. And then way over here, that's 5000. For those of you who missed it, 1, 2 and 5. That's it. This was a basic part of an essay question from several years ago, you wouldn't believe the number of people who couldn't handle it. You got it though. Go ahead, practice it a little bit, and then it'll make sense.
Now, there is a problem with RFLP in that it does require a fairly large amount of DNA, in order to be able to cut it up, and have enough for you to actually be able to see it in a gel. And if you're working with limited samples, and you've cut it up and you accidentally loaded the gel wrong or you used the wrong enzymes, well you just cut up all of your sample. And now you're in deep dookie.
PCR on the other hand, has a big advantage because instead of cutting up your DNA, you can start with a really small sample. And what you're doing is you're copying the DNA, make more of it. Now, let's take a look at its name and we can again do like we did with the Restriction Fragment Length Polymorphism. We can break it down to try and figure out what it's about.
Polymerase, well -ase tells us it's an enzyme. So I know this is all about making a polymer. This is an enzyme that's make a polymer. And a chain reaction is a reaction that causes another reaction. So this is all about one of the special enzymes that's used in DNA replication called DNA polymerase. Now DNA polymerase is the molecule that actually builds DNA. PCR takes advantage of one of the special qualities of that DNA polymerase enzyme.
The DNA polymerase enzyme only extends regions of DNA when it's copying. It can't actually begin it. Now if you think back to DNA replication, you know that it first begins with the unwinding of the helix. And that is usually accomplished by a bunch of enzymes. And then another enzyme called primase starts building these beginning of the DNA molecule, called a primer. And then DNA polymerase extends wherever there is a primer.
PCR, what you do is, you eliminate most of the enzymes except for the DNA polymerase. We use heat to unwind the helix. That's one of the reasons why this depends on a special kind of DNA polymerase called taq polymerase. It's short for I believe thermophilus aquaticus or something like that. It's a form of bacteria that lives in volcanic hot pools.
And so the heat that we're using to open up the enzyme, it means that we don't have to worry about helicase, the opening enzyme. And the DNA polymerase, the taq polymerase, can survive the heat. Now instead of letting primase build primers all through the DNA, instead we put in pre-made, highly specific primers, that we made, so it only goes to the section of DNA that we're interested on looking at.
You also add in the bunch of raw materials, the nucleotides that are needed to build our DNA molecule. Then you just wait 90 minutes, and you're done. Let's take a quick look at a really nice YouTube video that goes through this whole process very well.
Now let's go ahead and make the image larger, so it's easier to take a look at. So here we have our test tube and it's filled with DNA samples. We start off at low temperatures. Look at all this DNA, we don't want to copy all of this DNA. Instead, there is just somewhere deep within there. Here it is coming very vaguely in the view. We're just interested in this one region here. So that's the part of the DNA molecule that we're going to build primers and advance and put them in the test tube with it.
So they've highlighted this section of DNA, our target sequence with green.
This other stuff blue and magenta is the other regions of the molecule we have no interest in. Now what we're going to do is, we're going to heat it up. As we begin a first cycle, we heat it up to almost boiling, 95 degree Celsius. That makes the hydrogen bonds that we're holding the DNA double helix break. Because they are only like molecular vacuole.
And here we see our primers for the beginning and end of the gene that we want. We let things cool down and now the primers have enough of an attraction to cool there. The hydrogen bonds, and now comes our special taq polymerase, it's a special form of DNA polymerase. And these little green things are the raw materials, the raw DNA nucleotides; the As Ts Cs and Gc that are floating around the test tube. Why? Because we put them in there. The polymerase goes along it copies.
Now it doesn't stop when it reaches the end, because it just keeps copying until it runs out of time. And we only give it maybe 90 seconds at most to do this. That's the end of cycle one. What do we do? We do the whole process over again. You're thinking but why? Well, watch to see what happens.
Again we heat it up to 95 degrees Celsius to separate the two sides. This is sometimes called melting the DNA. We allow it to cool enough so that the primers can attach. These are the exact same primers that we're there before. They're still on the test tube. Then we bring it up to the optimal temperature for the DNA polymerase, which is usually around 72 degrees or so. And then we let it extend. We give it enough time and it goes along and it copies. Notice this band here and this band here. We've just made pieces of DNA that are exactly the size of the gene that we're interested in. At the end of cycle two. Let's move on to our third cycle.
Again, heat it up. It separates.
The primers are allowed to attach as we cool it down to 60 degrees. You can see them here. Raise it up to the optimum temperature for the taq polymerase, and it copies. Yes, we're still making these longer copies. But now, we've just made two molecules that are exactly the right size for our target gene. We just do it over and over, again and again. Now let's fast-forward a little bit. Here we begin cycle four.
We heat things up, we cool things down. Allow the primer to go in, and then bring it back up to the 72 degrees that's the optimal temperature. And it extends it. Notice, we're making more. We have eight made here, eight made there. Now you're thinking, "But this is a whole mess." The thing is, this side here is going to grow a lot faster. It's going to go through exponential growth.
As we begin cycle five, again, we open it up, add the primers and everything happens. Now this actually takes place inside of a machine. You can just add in your chemicals, programme it, and walk away. You'll see by the time we've reached here, now look. These guys are outnumbered. We've got 22 of our target molecules over here only 10 of those.
This side is basically going to start, like I said growing at an exponential stage. You do this 30 times which takes you roughly 90 minutes. On this side, that's in the billions, this is 60. This will show up on a gel, this won't.
So having done that, we've made gazillions and gazillions of whatever DNA molecules sequence that we're interested in. And this can be used as I said in CSI, you can use this for identification purposes. And whether by putting in a bunch of different primers or by putting in primers for just the gene, the section that is different between one person and the next.
Let's take a look at how they could do some DNA identification. Very often with CSI's type of things, what they are looking for are sequences in people's DNA where there is repeating elements, sometimes called tandem repeats. Whether short tandem repeats or a variable number of tandem repeat. And in this person, you can see this repeating box repeats over and over, many more times than this person who has only say, for example, three repeats.
And what this does is when we do PCR on this person's DNA, they make much shorter copies which can move further. Whereas when we do PCR on this guy, starting the primer here and there, we wind up making a much larger piece that went here. And since everybody has two copies of every gene, one from mummy, one from daddy, assuming your parents don't have two copies of the same tandem repeat that they gave to you, you wind up having two bands. This person is homozygous four, one of these tandem repeats.
Now, like I said previously, if we just were looking at a particular gene, say for example we could test to see if you've been injected by the DNA of HIV. What we could do is just do PCR using a primer for the HIV virus. You run it on my DNA and hopefully you won't see anything. Whereas somebody who is HIV positive, you'll see a band on the unfortunate location.
One really fascinating use for PCR, that is allowing us to take leaps and bounds in our knowledge of genes, is something called DNA sequencing. And that takes advantage of a particular kind of nucleotide. Let's take a look at that.
Here we see a standard nucleotide. Now this is the phosphate. This is the five carbon sugar called deoxyribose and this is one of those nitrogenous bases As Cs Ts and Gs in DNA.
How is this different? Remember this is deoxyribose. What does the deoxy mean? That means that, this carbon right here, should have an O and an H attached to it. But that's it if it's ribose. We remove it to make deoxyribose. Look what happened here, that oxygen is gone. This is something called di for two, dideoxyribonucleic acid.
And what you do, is you can have in your test-tube, a whole bunch of nucleotides, normal DNA nucleotides and the DNA polymerase will go along and make you a long chain, but you put in some small percentage of this dideoxyribose molecules. See, that oxygen there is what the next nucleotide needs to join to. And if it's not there, if it's gone, then DNA replication stops there. So by having a small percentage of your free floating nucleotides in the test tube, in this dideoxy form, then you'll wind up getting a whole number of different chains of different lengths that all stop at say A. Because you put in the dideoxy adenine molecule.
And so you could have four lanes each with dideoxy adenine or dideoxy guanine, dideoxy thymine. Or what people have done to get really clever, is they've put all four of them into the same test tube. But they've added other additional things that make each one of these glow a different colour. And then you can run it in the PCR machine, load it into a gel and then walk away. You have a camera that just sits there and watches and goes, "Green, must be A, yellow must be T." And it just reads the sequence; As Ts Gs and Cs along that DNA molecule. And this is what is the basic technique behind what's called the human genome project where they've been sequencing human DNA.
Like I said earlier, there is other techniques, things like Gene Chips or Microarrays. But PCR and RFLP are the two main essays of the AP Biology test.
The thing to keep clear in your head is the big difference between them. Is that, RFLP is all about cutting DNA into smaller pieces, starting with the large sample. While PCR is about if you want, starting with a small sample and just amplifying or copying a gazillion times whatever gene you're interested in. This makes PCR much more sensitive and much more flexible. And that's why it's become the main state tool of most labs.
Again you need to understand about Gel Electrophoresis,where the basic idea being that short molecules can go through quickly, larger molecules lumber along much more slowly. Do you practise so that you can use a standard curve in order to figure out the length of an unknown piece of DNA?
And again, go online, go to that virtual lab because that particular lab, the lab six is one that the AP Biology test writers seem to return to every four years, plus or minus a bit, and in the essay question. And invariably, it's somewhere within the multiple choice question.
Even more important than maximising your score on a test however is, understanding one of these amazing new tools that Biologists have now. And that they're using to peer into some of the basic functioning of our bodies. Because by the time that you have kids, you and your spouse if you have any kind of concerns about genetic problems, it's likely they'll have a test so that you can go to the doctor and easily determine what are the chances that you could pass on that particular syndrome.