Doug's Introduction to DNA

DNA is the material which "encodes" heredity in all plants and animals. Here we present a discussion of the subject, somewhat more extended than Mark's introduction, designed to help our participants understand the results. You can skip over all this material if you are just interested in results and are not perplexed by the complicated jargon. We are here trying to explain some of the jargon.
DNA is simply a chemical which forms very long molecules. If one single molecule is stretched out full length it would be typically an inch or two long. Inside cells they are all coiled up. The molecules consist of a long "backbone" to which are attached molecules called "bases". There are four types of bases, called Adenine, Cytosine, Guanine, and Thymine, abbreviated A, C, G, and T. There are two types of DNA in cells, that in the chromosomes (those big molecules) in the cell nucleus, and those in mitochondria, tiny sub-units of cells which convert food to energy. The mitochondria in a person come only from the mother, and can be used to trace a person's descent in the all-female line (mother's mother's mother, etc.). The nuclear DNA comes half from the mother, half from the father. It is just one of those chromosomes, called the Y, that we use in this genealogy project. It is present only in males, and so we can use it to trace the all-male line. Normally all the Y chromosomes in a child are identical to those in his father. But sometimes the copying mechanism makes a mistake, resulting in a difference, called a mutation. These mutations, over time, cause different families to have differences in their Y chromosomes that allow us to distinguish them.

In order to understand how we classify our participants, you must understand how the Tree of Mankind has evolved since long ago, perhaps 100,000 years ago, when there existed exactly one man, from whom all living men descend in the all-male line.It may seem counterintuitive that all men descend from just one, but the way the mathematics of population growth work show that this is so. We know from plenty of data that this was a man, living in Africa, and not a sub-human primate, or even a Neanderthal man or Cro-Magnon man. Since he lived all mankind has spread out like a giant tree, with a new branch created every time there is a mutation on the Y chromosome. In the popular press this man has been dubbed "Y chromosome Adam".

Mutations: UEPs (SNPs) versus STRs

Now in fact there are two very different kinds of mutations which occur. These two kinds result in two different classification schemes for men. The first type is called a Unique Event Polymorphism (UEP) and occurs very rarely. In all the time since "Y Adam" only a few (some three hundred) of these mutations have generated lines that exist to this day and have been studied by geneticists. Tests show that we can use UEPs to classify all men into what are called haplogroups (which must be most carefully distinguished from haplotypes). These haplogroups form a tree structure with some 18 major branches and some hundred or so sub-branches.The second type of mutation is called the Short Tandem Repeat (STR) and these occur much more frequently, and can happen repeatedly in the same genealogical line. DNA testing for genealogy measures many (as few as 12 or as many as over 100) different STR markers. The list of the results of these STR tests is called a haplotype. Haplotypes are what is used for most genealogical purposes.

"UEP" stands for "Unique Event Polymorphism". The expectation is that these are so rare that each one happens only once in all history. (The haplogroup of Somerled, R1a, is a special case. It's strange properties are discussed in this popup.) There are several types of these. One, called the "SNP" for "Single Nucleotide Polymorphism" is so much more common then the others that frequently people just say "SNP" when they really mean "UEP". An SNP (pronounced "snip") just means that one base is replaced by another, by accident. For example, a C might replace a G. The second kind is a deletion or "del". This means that one or more bases (in a row) are simply missing in the son with the deletion. A third kind is the insertion, or "ins" where some new base or bases get inserted into the chromosome.

UEPs are given names like "M17" or "M45" or "P40" or "S26", which are just serial numbers whose letter tells which lab discovered them. Ones discovered in BigY or FullGenomes data on our participants are labeled "CLD", for example CLD50 is a marker specific to Clanranald.  

Now the Y chromosome has some 40 million bases along its ladder, of which only about half have been well studied. UEPs occur only very rarely, so rarely that there is only a 50% chance that a son will differ at even one of the 40 million places from his father.The main branches are designated with capital letters from A to R. There was an older system of nomenclature in which the subbranches were designated by alternating numbers and letters, going along as the branches get smaller and smaller, e.g. R1a1a1b2c3. These keep changing, so people are no longer using them, except in some cases for the first few places like R1a versus R1b. In our study so far, we have men who are in haplogroups E-M35 (formerly E3b), G-M201, I, I-M307 (I1a), I-P37.2 (I1b), I-M223 (I1c) , J-M172 (J2), R1a, and R1b. We have classified all of our participants by haplogroup. Many have now had the UEPs tested, while in other cases we can infer the haplogroup from the haplotype. A drawing, very simplified, of the haplogroup tree appears below. The Y tree of the International Society of Genetic Genealogy is probably more accurate.

A
· BT
· · B
· · CT
· · · DE
· · · · D
· · · · E
· · · CF
· · · · C
· · · · F
· · · · GHIJK
· · · · · G
· · · · · HIJK
· · · · · · H
· · · · · · IJ
· · · · · · · I
· · · · · · · J
· · · · · · K
· · · · · · · LT
· · · · · · · · L
· · · · · · · · T
· · · · · · · · NO
· · · · · · · · · N
· · · · · · · · · O
· · · · · · · · · · S
· · · · · · · · · · M
· · · · · · · · · P
· · · · · · · · · · Q
· · · · · · · · · · R
· · · · · · · · · · · R1
· · · · · · · · · · · · R1a
· · · · · · · · · · · · R1b
· · · · · · · · · · · R2

Go to Page 2