The Genetic Genealogist

Adding DNA to the Genealogist's Toolbox

Archive for the "DNA Databases" Category


The YHRD Database

One of the steps in analyzing the results of a Y-DNA test is to search through Y-DNA databases to look for potential matches. These matches, depending on how well they match, might be relatives, either close or distant (in recent genealogical terms – we’re all distantly related, of course).

One of those databases is YHRD (Y-STR haplotype reference database). The project has two main goals:

  1. The generation of reliable Y-STR haplotype frequency estimates for minimal and extended Y-STR haplotypes to be used in the quantitative assessment of matches in forensic and genealogical casework, and;
  2. The assessment of male population stratification among world-wide populations as far as reflected by Y-STR haplotype frequency distributions

According to the YHRD website:

“To this end, a growing number of diagnostic and research laboratories have joined in a collaborative effort to collect population data and to create a sufficiently large reference database. All institutions contributing in this project, participated in an obligate quality control exercise.
“This database is interactive and allows the user the search for Y-STR haplotypes in various formats and within specified metapopulations. Related information i.e. STR characteristics, mutations, population genetic analyses etc. is documented.”

The YHRD database is contantly being updated, and on August 10th, Release 22 was added:

“Release 22 is out with 52,655 haplotypes in 464 populations. 50,867 haplotypes of these are completely typed for 9 (Minimal haplotype) and 23,981 for 11 loci (Extended or SWGDAM haplotype). Twenty populations were added or updated today: two Amerindian tribal populations from the Formosa province in Argentina (Pilaga, Toba), one from Venezuela (Caracas), two from provinces in Colombia (Boyaca, Cundinamarca), three from Siberian nomad populations (Western and Central Evens, Iengra Evenks), one from Belarus (Pinsk), three from Ukraine (Kiev, Lviv, Lugansk), three populations from Capetown in South Africa, three from Ravenna, Rimini and Val Marecchia in Italy, one from Hungary, one from Peru and one from Oran in Algeria. We would like to thank the following colleagues for submissions and updates: Daniel Corach and his group (Buenos Aires, Argentina), Brigitte Pakendorf and her group (Leipzig, Germany), Neal Leat and his group (Capetown, South Africa), Susi Pelotti and her group (Bologna, Italy), Pamzsav Horolma and her group (Budapest, Hungary), Ignacio Briceno Balcazar and his group (Bogota, Colombia), Lisbeth Borjas and Tatiana Pardo (Venezuela), Sergey Kravchenko and his group (Kiev, Ukraine), Gian Carlo Iannacone and his group (Lima, Peru) and Carlo Robino and his group in Torino, Italy. Please refer to the section YHRD contributors to get more information.”

HT: Dienekes’ Anthropology Blog

Ethical and Legal Issues Surrounding Large-Scale Genomic Databases

I recently came across a review article by Henry T. Greely, a Professor of Law, Professor (by courtesy) of Genetics, and Director of the Center for Law and Bioethics at Stanford. The article is entitled “The Uneasy Ethical and Legal Underpinnings of Large-Scale Genomic Biobanks (pdf)” and was recently published in the Annual Review of Genomics and Human Genetics.

According to Mr. Greely, the identity of participants in large-scale genomic biobanks cannot effectively protected. A biobank is defined as a database of genotypic and phenotypic data. Using genetic information, physical information, or a combination of the two, people can identify an individual in such a large database:

“Someone really interested could get a DNA sample from me – from a licked stamp, a drinking glass, or some tissue – and have it genotyped for a few hundred dollars, but few will have to go to the genomic data; the phenotypic and demographic data will often be sufficient.”

“Eliminating name, mailing address, and social security number does not eliminate identifiers; it just eliminates the easiest identifiers, making the search somewhat more difficult and expensive.”

Unfortunately, it is impossible to remove all the data one could use to identify biobank participants. As Mr. Greely opines, “[t]he more the data is removed or obscured, the more scientific value is lost; the more data is kept, the less real the anonymity.”

So what is the answer? First, consent forms must reveal the fact that while biobanks will attempt to provide anonymity, they simply will not be able to guarantee it. They must also reveal that they cannot inform subjects of all the risks and benefits because many future research topics haven’t even been suggested as of yet. Second, biobanks must prevent participants from being upset by unexpected uses of their materials, either through a thorough consent form, or through general communication with research subjects (such as a mailing list or online community). Third, researchers have a moral (and perhaps legal) duty to inform participants of potentially harmful information uncovered by research. This raises a whole host of questions, including how significant the correlation between a gene and a disease must be to require a participant’s knowledge, how long the biobank should monitor the participant’s genetic information, and whether the biobank should be responsible for genetic counseling.

Mr. Greely raises a number of interesting questions that will have to be answered by governments and companies around the world as the need for biobanks increases and the relative ease of biobank creation decreases.

“Genetic Genealogy and the Ancestries of African Americans” at the U of C

On June 28, the University of Chicago’s Newberry Library presented a panel discussion entitled “Genetic Genealogy and the Ancestries of African Americans” with Rick Kittles. In addition to being an associate professor of medicine at the University, Mr. Kittles is also the science director of AfricanAncestry.com.

The panel also included Christopher Rabb, a genealogist. The two discussed the difficulties facing African Americans who are interested in discovering their roots. After exhausting paper records, Mr. Rabb used DNA testing to learn more about his paternal and maternal lineages.

Despite the successes of genetic genealogy, “[b]oth Rabb and Kittles recognized that genetic testing for ancestry complicates the history and social reality of race in the United States,” noting that 30% of African Americans descend from Europeans.

Then the articles states the following:

Genetic genealogy has its detractors. In a heated question-and-answer session, panel moderator and genealogist Tony Burroughs grilled Kittles on African Ancestry’s accuracy. Using a proprietary database of 30,000 genetic samples from Africa, the company’s work has never been published, reproduced, or otherwise independently verified. Furthermore, because the tests use the DNA of current population groups, the “ancestry tests” in effect tell only the location of “cousins” in Africa, not necessarily where African Americans’ ancestors were located 400 years ago.

“The audience was largely unconcerned by Burroughs’s objections, responding with murmurs, sighs, and rolled eyes. After the program, glowing smiles and firm handshakes bombarded the man whose work promises history and identity for millions.”

I don’t think I would classify Tony Burroughs as a “detractor” of genetic genealogy. He’s just a big fan of good science. Like Mr. Burroughs, I too am wary of any database that isn’t public, or available for peer review, such as the AfricanAncestry.com database. Here’s a comment by Mr. Burroughs in a previous interview:

“DNA is going to be very important and it’s on the cutting edge,” said professional genealogist Tony Burroughs, who teaches at Chicago State University. “But it’s not a panacea. You’re not going to discover your entire family tree from a little spit on a cotton swab.”

Eventually, a video of this panel discussion will be available online.

Megan Smolenyak picked up on a small but very interesting detail in the story – the fact that 60 minutes was filming the presentation. It seems that they might be planning a piece on genetic genealogy. Megan also discussed some of the reasons that Burroughs questions AfricanAncestry.com’s database.

Genetic Genealogy in the Czech Republic – A Hot Topic!

Two weeks ago, EyeonDNA posted about genetic genealogy testing in the Czech Republic by two companies, Genomac and Forensic DNA Service. A recent article in the Prague Post details the animosity over ethical concerns which exists between these two competitors.

A few days later, Ludvik Urban responded to the article via Rootsweb, and EyeonDNA shared Mr. Urban’s response with her readers. Today, you can read Genomac’s response (from one of the founders, Dr. Marek Minarik) to Mr. Urban’s concerns about the company.

Whew! Luckily, both sides were able to share their side of the story – it makes for some interesting reading!

The Genographic Project Database

With Friday’s release of a paper in PLoS Genetics, the Genographic Project also released a spreadsheet with the results of over 16,000 mtDNA tests, including HVS-I and SNP results (available here). In addition to sequencing the HVS-I region of mtDNA samples the Project is now testing 22 SNPs. These SNPs were chosen based upon a number of factors, which are discussed in the paper.

“Twenty one SNPs and the 9-bp deletion make up the total of 22 biallelic sites. For simplicity, we will refer to all biallelic sites as SNPs. The number of SNPs tested was gradually increased from ten at inception of the project to the 22 currently used. The ten initial SNPs were 3594, 4580, 5178, 7028, 10400, 10873, 11467, 11719, 12705, and 14766 (numbers refer to the nucleotide position in the mitochondrial genome). The panel was augmented to a total of 20 coding-region SNPs by including the following additional ten SNPs: 4248, 6371, 8994, 10034, 10238, 10550, 12612, 13263, 13368, and 13928. The panel was further augmented by the addition of SNP 2758, to a total of 21 coding-region SNPs and finally by including the 9-bp deletion at position 8280 to a total of 22 coding-region SNPs (Figure 4). Two further changes were made: positions 8994 and 13928 used in some early work were respectively replaced with their phylogenetic equivalents 1243 and 3970. Therefore, the current panel includes the following SNPs, with their respective gene locations shown in brackets [33]: 2758 (16S), 3594 (ND1), 4248 (M), 4580 (ND2), 5178 (ND2), 6371 (COI), 7028 (COI), 8280 (9-bp deletion) (NC7), 8994 (ATPase6), 10034 (G), 10238 (ND3), 10400 (R), 10550 (NDRL), 10873 (ND4), 11467 (ND4), 11719 (ND4), 12612 (ND5), 12705 (ND5), 13263 (ND5), 13368 (ND5), 13928 (ND5), and 14766 (Cytb).”

The early mtDNA samples were not tested for all SNPs, so your results may not be included in this particular spreadsheet. If you log into the Project with your Project ID #, then click on “See Your DNA Results” overlayed on the map, you will see a circle for SNPs. Click there and you’ll be able to see which of the SNPs you tested positive for.

If you download the spreadsheet and are able to identify your mtDNA based upon the HVS-I results, you can get your results for each of the SNPs tested above. For instance, I was able to identify my contribution because it is so unique. Then I was able to look at all the SNP results for my DNA. Of course I can’t be 100% sure that the sample is mine, but I’m about 99% sure.

I didn’t get these results emailed or mailed to me because I originally tested with FTDNA and agreed to add my results to the Project. For those readers that tested with the Project originally, did you get a table with the SNP results included in your HVS-I results?

The Genographic Project Public Participation Mitochondrial DNA Database

The Genographic Project is probably the largest genetic genealogy project in the world. For $99, the project will sequence seqments of either your mtDNA or your Y chromosome for addition into their publicly available database. The goal of the project, with ten research centers around the world, is to “map humanity’s genetic journey through the ages,” and to “address anthropological questions on a global scale using genetics as a tool.” There has been a huge response to this project, and they just released their first research paper using the results they have collected to date:

“Family Tree DNA is proud to announce that the first paper resulting from data collected through the Genographic Project has been published today at the PLOS GENETICS. “The Genographic Project Public Participation Mitochondrial DNA Database” can be found at http://genetics.plosjournals.org and it will be uploaded to the Family Tree DNA public library as well.

The paper resulted from the collaboration of the Genographic Project Scientific Team, Family Tree DNA Genomics Research Center, and the IBM Data Analytics Research Group.”

Results

This paper is all about the mtDNA sequences they have obtained through the project. In the first 18 months of the project, they have collected an amazing 78,590 mtDNA genotypes!! In the paper, they describe their genotyping parameters (i.e. how they go about sequencing the mtDNA), the frequency of each haplogroup in the database (for instance, 38.2% of the database is Haplogroup H!), and their attempt to identify any potential Neaderthal contribution to the database (there isn’t any).

The researchers also list a few goals for the future of the project and the scientific community as a whole:

“First, as sequencing procedures have become more efficient and stretches of 600 bp can easily be obtained, we suggest standardizing the reported ‘‘HVS-I’’ range to include positions 16024–16569 as presented herein.”

“Second, it would be worthwhile to create a standard list of coding-region SNPs used by the scientific community for Hg assignment.”

Third, the project should actively recruit samples from people in non-Western populations “to properly survey the genetic variation in non-Western Eurasian lineages.”

So what is the take-home message from this new paper? That the Genographic Database is a valuable, standardized database for geneticists, genealogists, anthropologists, and other -ists. The last paragraph of the study states: “In summary, we report both data and new classification methods developed using by far the largest standardized mtDNA database yet created, and detail the logistic, scientific, and public considerations unique to the Genographic Project. Most importantly, we return to the public a database made possible by their enthusiastic participation in the Genographic Project.”

Here’s Figure 4 from the project, a phylogenetic tree of mtDNA haplogroups, with the number of each haplogroup represented in the database (click it to get a larger version):

figure-4.jpg

(Note that PLoS uses the Creative Commons Attribution License for all their papers, meaning that the public is free to, among other things, “copy, distribute, display, and perform the work”, as well as “make derivative works,” as long as the user gives the original author and source credit.  Thus, the above figure comes from:

The Genographic Project Public Participation Mitochondrial DNA Database Behar DM, Rosset S, Blue-Smith J, Balanovsky O, Tzur S, et al. PLoS Genetics Vol. 3, No. 6, e104 doi:10.1371/journal.pgen.0030104

This is, of course, another great reason to love and support open-access journals such as PLoS.)

For the First Time, a Human Receives (Almost) Entire Personal Genome!

watson_james.jpgAdmit it, you’re dying to get your hands on Watson’s genome, aren’t you? Who isn’t?! Yesterday James Watson was handed his sequenced genome on DVD from 454 Life Sciences. There’s a great press release from the Baylor College of Medicine where the ceremony took place.

In a very big day for genetics and human beings alike, Watson was the first person to be handed his entire genetic sequence (for those in the know, Venter only received some or most of his sequence according to most sources).

Amazingly, according to the press release, the genome was sequenced over two months for $1 million. Incredible, considering the Human Genome Project took years and billions of dollars, and even Venter’s project took $300 million.

The article is very interesting, and I took the following quote:

“A report on the project and a commentary on its ethical implications are scheduled to appear in the near future. The raw sequencing data was released today to the publicly available resource called GenBank National Center for Biotechnology Information Trace Archive.”

Additionally,

“Watson, who chose BCM as the site at which the data transfer will take place, plans to evaluate the information included in the genome and write about its significance to him, his family and the future of genetic medicine at a later time.”

In the right sidebar there’s video of the presentation, biographies, and links to more information. I’m really looking forward to the genome-mining that I hope is taking place even as we speak! By the way, I put ‘almost’ in the title because Watson didn’t want to know about his own ApoE4 gene.

Scottish DNA Database Being Created at Glasgow Caledonian University

Genealogists interested in researching their Scottish roots will soon have a new resource thanks to a new genealogy center created by the Glasgow Caledonian University in Scotland.  The center will join together traditional genealogical research with recent advances in genetic genealogy to help individuals verify their Scottish roots with DNA testing.  According to the Scottish Tourist Board at VisitScotland.com, more than 50 million people throughout the world can claim Scottish ancestry.

This testing will be done by mouth swab and will be conducted in a new forensics lab built at the University.  The center will use both Y-chromosome and mtDNA results to build their database.  Researchers at the University hope that the center will eventually be able to build a genetic map of the clans of Scotland by looking for markers that are specific for each particular clan.  The test should cost around GBP60 ($120USD), and a number of people have already expressed an interest in the test.

The center will also involve the genealogy company Scottish Roots Ltd. and the 1745 Trading Company, a sales and marketing firm.

[Thanks to UK Family Search]