The Genetic Genealogist

Adding DNA to the Genealogist's Toolbox

Archive for the "Genealogy" Category


A Review of AncestryDNA – Ancestry.com’s New Autosomal DNA Test

In the past, I’ve reviewed new autosomal DNA testing options offered by 23andMe and Family Tree DNA:

Today, I’m reviewing the new autosomal DNA test from Ancestry.com called “AncestryDNA.” I’ve already written at length about AncestryDNA, so I won’t cover too many of the basics here.  I have an in-depth introduction to the product located at “Ancestry.com’s AncestryDNA Product,” which you might want to check out before or after reading this review in order to gather more information.

AncestryDNA: An Introduction

The introduction page, which appears after clicking on “View Results” on the front page, consists of my Genetic Ethnicity Summary and the Member DNA Matches (which is further broken into close cousins and distant cousins, as discussed in detail below).  Please note that for purposes of this review I’ve removed the identifying information for my genetic matches.

Genetic Ethnicity Summary:

My genetic ethnicity results, which suggest 90% European and 10% Uncertain, are very interesting.  In a recent webinar with the AncestryDNA team, they reported that the genetic ethnicity analysis is still very early in the beta phase, and will continue to be updated and refined as new reference populations are added.  Indeed, I’m predicting that over time as new information is added and the algorithm is refined, some or all of my10% Uncertain will be categorized (perhaps to reflect my maternal Asian and African contributions, which I’ve written about before), and that some of of my 90% European may very well change.

Under a heading “About Your Ethnicity” is a pop-up file with more information about Ancestry.com’s ethnicity estimation algorithm.  In that file, under “Is It Accurate,” for example, Ancestry.com provides the following:

When determining your genetic ethnicity, we hold our process and results to an extremely high standard of accuracy.  Our lab’s analysis uses some of the most advanced equipment and techniques to measure approximately 700,000 points in your genome (with at least a 98% rate of accuracy).  We compare that to one of the most comprehensive and unique collections of genetic signatures from around the world.  And as this collection improves over time, it can only get better.

I’m not sure whether the AncestryDNA tests these 700,000 SNPs, or whether it tests more SNPs but is currently using a subset of 700,000 for its analysis.  I’ll try to find this information.

I thought it might be interesting to compare my genetic ethnicity results from the three companies (Ancestry.com, 23andMe, and FTDNA):

Ancestry.com’s AncestryDNA:

  • 78% Scandinavian
  • 12% Central European
  • 10% Uncertain

23andMe’s Ancestry Painting:

  • 98% European
  • 2% Asian
  • <1% African

Family Tree DNA’s Population Finder:

  • 68% European (Northeast European) – Finnish
  • 32% Middle East (Jewish) – Jewish

After reviewing the results one thing is certain: all three companies estimate a strong European contribution to my genome, particularly Scandinavian (ranging from 68% to 78%).  It’s ironic, however, that I have yet to identify a single Northern European ancestor!  I certainly won’t be surprised when one pops up someday.

Clicking on “See Full Results” takes me to a more detailed analysis of my ethnicity results, but not before I click through the following pop-up:

Please keep in mind…Our prediction of your genetic ethnicity is not yet finalized. As we gather more DNA samples and continue our research we expect your ethnicity results to become more accurate and perhaps more detailed.

As I stated above, the ethnicity results are likely to change over time, so be forewarned.

The Full Results page – reproduced below – includes historical and anthropological information about each of the identified regions from your ethnicity profile (Scandinavian and Central European, for me).  It also shows a list of genetic matches who share the relevant region (it’s a long list along the right lower side of the page, but it’s not shown below for privacy reasons).  You can also zoom into the map where ancestors from a tree you’ve linked to your account are displayed.  For example, I have 8 listed in Ireland and 2 in Central Europe.

In summary, Ancestry.com’s AncestryDNA test provides a genetic ethnicity/region calculation based on about 700,000 SNPs and a large collection of both public and proprietary reference databases.  The product can currently categorize DNA into at least 22 different ethnicities/regions, with more to come.  So be prepared for changes to your estimation as their algorithm and databases grow.

Member DNA Matches

Also on the introductory page is a listing of genetic matches.  These are individuals that, based on shared segments of DNA, you are predicted to share a common ancestor with.  An interesting aspect of the DNA matches list, however, is a sliding scale for the relationship confidence level, which ranges from 99% to 10%:

  • 99% Confidence – Immediate Family
  • 99% Confidence – 1st Cousins
  • 99% Confidence – 2nd Cousins
  • 98% Confidence – 3rd Cousins
  • 96% Confidence – 4th Cousins
  • 50% Confidence – Distance Cousins
  • 20% Confidence – Distance Cousins
  • 10% Confidence – Distance Cousins

Accordingly, the introductory page can be customized to only display cousins of a certain confidence level.  If I reduce the confidence level to 96%, for example, I only have two matches (my two predicted fourth cousins shown in the picture above).

Clicking on the “What Does This Mean” link next to the  possible relationship range on the “Review Matches” page for each genetic cousin (see the figure below) causes the following information to be displayed, along with some nice inheritance charts:

Predicted Relationship Info: FOURTH COUSIN

It’s interesting to note that (at this degree of separation) we are accurately able to predict only about 85% of the possible relatives that are out there—in other words there is a 15% chance that our DNA analysis does NOT recognize an actual relative of yours. One way to be more certain that the DNA testing captures as many relatives as possible is to have multiple members of your immediate family tested.

It is also interesting to note that at this degree of separation we are sometimes wrong in our prediction of a real relationship. We’ve found that for this relationship about 15% of the time we predict a relationship that cannot be found in any family tree.

This provides some interesting insight into AncestryDNA’s matching algorithm and, accordingly, the algorithm’s results.  For example, it’s important to always keep in mind that there is a roughly 15% chance of incorrectly labeling an individual either as a match or as not being a match.

As the user slides the scale from 99% down to 10%, more results typically appear.  For example, I currently have two 4th cousins listed as matches, 9 matches with 50% confidence, 14 matches with 20% confidence, and 38 matches with 10% confidence.  I expect these numbers to increase considerably once more test results become available.  I don’t know how big the AncestryDNA database currently is, but I’m guessing that only a few 100 to a few 1000 people, at the very most, have undergone testing so far.

Comparing Family Trees

The true power of the AncestryDNA test lies in the ability to automatically compare your uploaded family tree with the uploaded family tree(s) of genetic matches.  For example, one of my predicted fourth cousin matches has a public tree with 408 people.  Clicking on “Review Match” takes me to the next page with more information (see the next screenshot) including each of the following:

  • A predicted relationship and predicted relationship range;
  • Our ethnicity comparison (a very cool and potentially very useful feature);
  • My genetic cousins’ entire tree out to 7 generations (and a link to see more);
  • A possible shared ancestor (a “shaky leaf” hint) if one is identified;
  • Surnames that we share in common; and
  • My genetic cousins’ surnames through 10 generations.

I especially like the Genetic Ethnicity Bar (I just made that up, but I guess it fits) comparison, which shows your ethnicity prediction next to your matches ethnicity prediction.  For example, my fourth cousin displayed in the image below is 93% British Isles and 7% Uncertain.  Since I have no reported British Isles genetic contribution, my Genetic Ethnicity Bar is gray:

 On the other hand, if there is some matching ethnicity contribution, the Genetic Ethnicity Bar comparison will look like this:

This genetic match and I, predicted to be distant cousins, both have contributions from Central Europe and Scandinavia.  My match also has British Isles and Middle Eastern, which I am estimated not to have.

Also on the the “Review Match” page is a link to send a message to the match (very important for genealogists).  I also like the “Last signed in” information, which lets people know just how active a genetic match might be (and why they aren’t answering your email!).

Common Ancestor and Shared Surnames

As can be seen from the last two screenshots, the list of shared surnames (if there are any) is prominently displayed near the top of the page.  If there was an individual in common between our trees, he or she would also be displayed there.  Unfortunately, when I review the match with each of my possible genetic cousins, I typically have one or more shared surnames, but none have a single identified common ancestor.  I was hoping for such a match, but I’ll have to be a bit more patient.   While I currently have about 55 matches, only some of those have public trees, and even fewer have substantial family trees (larger trees increase the likelihood of identifying a possible shared ancestor, of course).

Conclusion

This post included just a few initial thoughts about my testing experience and results.  I may add more information, or create a new post, as I continue to review my results.  If you have any questions about the testing process or ancestry results that I didn’t address, please feel free to leave a comment.  I’m sure many other people have the same question, so don’t hesitate to ask.  I’ll also try to get the AncestryDNA team to answer any questions I can’t answer.

While there is currently no information about when AncestryDNA will be available, or pricing, I’m sure that this will be available soon.

I’m looking forward to your comments, ideas, and questions.

(Disclosure:  I received my AncestryDNA test without charge from Ancestry.com for review purposes and beta testing.  Regardless, I have attempted to review this product as honestly and as objectively as possible in order to provide valuable information about AncestryDNA to my readers.)

Ancestry.com’s AncestryDNA Product

I’ve written before about Ancestry.com’s new AncestryDNA autosomal test.  See, for example:

Webinar with Ancestry.com

Last week, I participated in a webinar with Ancestry.com regarding the AncestryDNA test (although, unfortunately, I had to leave a bit early due to a previous engagement).  It was a great list of about 10 well-known genealogy bloggers, each one of whom is someone I’ve been reading or following for years.  It was an honor to be included among them.

One of the participants was CeCe Moore of Your Genetic Genealogist.  CeCe has a nice summary of the webinar and the important points about the autosomal test and the user interface at “New Information on Ancestry.com’s AncestryDNA Product.”  If you’re interested in autosomal DNA testing, or in Ancestry.com, I highly recommend reading her post.

The Power of DNA

The highlight of the webinar – and of the AncestryDNA product – was the combination of DNA and family trees.  I’ve said before that the ability to combine DNA and the paper trail is the future of genetic genealogy, and the true power of DNA.

The AncestryDNA test automatically compares your family tree (if you have one hosted at Ancestry.com) to the family tree of your genetic matches (if they have one hosted at Ancestry.com, and if it’s public).  The user interface then suggests overlapping individuals that might be the source of the shared DNA!  The user interface presents this information as a “Potential Common Ancestor,” and provides it as a “shaky leaf” hint.  Thus, as with all shaky leaf hints, it should be subjected to further research and not blindly accepted.

You can also see the first 7 generations of each genetic match in your user interface (again, if their tree is public), another great benefit.

While there are of course MANY caveats to this matching algorithm, it eliminates a time-consuming step in sharing information with genetic matches, as many of us know from [many hours of] experience.  (I didn’t get a chance to ask if the matching algorithm takes into account the predicted relationship range of the genetic cousins being matched, but I’ll try to get that information for you.)

If you think about it for a moment, the power of this approach is mind-boggling.  Over time it will create a mesh of DNA and genealogies, with individual data points that can be confirmed or rejected based on the results of numerous test-takers.  In other words, there will be an enormous DNA family tree.  Not only that, but that enormous DNA family tree can then be used to test genealogical hypotheses (was John Smith’s mother a White?  was John Smith Jr. adopted? etc…).  While a long way down the road, the possibilities are endless.

Concerns About Combining DNA and Family Trees

I know there is a lot of criticism and concern about the quality of third-party genealogies on Ancestry.com.  It’s impossible to know just how subjective or objective the data in any given tree is.  It’s true that there will always be concerns about third-party genealogies, and that there will be many, many errors as genealogists begin to tie DNA to specific ancestors.

But these concerns are equally true for paper records.  Any time you tie a paper record to a certain individual in your family tree, there’s a serious possibility of error, and this error can be propagated throughout numerous genealogies.  Every genealogist has seen this before, probably many times. But the fact that we’ve recognized the error likely means that the error has been corrected through careful research.

There is nothing different or exceptional about tying DNA to ancestors.  Any time you tie a piece of DNA to a certain individual in your family tree, there’s a serious possibility of error.  Over time, however, careful and methodical research – likely contributed by many different test-takers – will allow genealogists to make the most reasoned and knowledgeable judgment.

There’s enormous power in numbers.

A Roundup of AncestryDNA Posts

Here’s a complete roundup of posts around the genealogy blogosphere about Ancestry.com’s new Autosomal DNA product (AncestryDNA):

Did I miss any?  Feel free to mention them below.

Disclosure: I received a free beta test from Ancestry.com, although I have not yet received my results (I will receive them this week, I believe).  However, I have tried to review this product objectively.

Genetic Genealogy at Public Radio International

PRI’s The World, a weekday radio news magazine, has a new piece by producer Carol Zall entitled “Roots 2.0: Using DNA to Trace My Ancestry.”  The piece makes for a great introduction to genetic genealogy.  I especially like the 35-year-old interview between the young Carol and her grandmother, as well as Carol’s interpretation of her results.

I spoke with Carol a few months about this piece, and she included a few quotes from the interview in the article.  Also included is a 2-minute soundbite of our conversation:

Also featured in the main article are the always-fantastic Daniel MacArthur and Joe Pickrell (you can find both of them at Genomes Unzipped).

Both Daniel and I also contributed short companion pieces:

Ancestry.com’s Autosomal DNA Product – An Update

This morning’s Keynote at Rootstech 2012, was from Ancestry.com and was entitled “Making the Most of Technology to Further the Family History Industry.”  Although I was unable to attend Rootstech in person this year, I was able to view the keynote online.

During the panel discussion, we heard from Ken Chahine (LinkedIn profile), the Senior Vice President and General Manager, DNA at Ancestry.com.  From his profile at Ancestry.com:

Ken Chahine has served as Senior Vice President and General Manager for Ancestry DNA, LLC since 2011. Prior to joining us he held several positions, including as Chief Executive Officer of Avigen, a biotechnology company, in the Department of Human Genetics at the University of Utah, and at Parke-Davis Pharmaceuticals (currently Pfizer). Mr. Chahine also teaches a course focused on new venture development, intellectual property, and licensing at the University of Utah’s College of Law. He earned a Ph.D. in Biochemistry from the University of Michigan, a J.D. from the University of Utah College of Law, and a B.A. in Chemistry from Florida State University.

During the keynote Dr. Chahine discussed the “revolution in the science of genomics” that many people really don’t appreciate yet.  He noted that this revolution is driving all sorts of new products and development.

Dr. Chahine stated that genealogists have been doing a good job so far of using DNA for family history, but so far it’s been pretty modest, typically turning to DNA when there is a problem.  With the revolution, however, “DNA is going turn into content.”  We can now look at millions and millions of markers throughout the genome regardless of male or female.  There are about 100 errors per generation, which are “breadcrumbs” or clues left by our ancestors about where they were in the past.  We will be able to get to the point where we can analyze and use that DNA content to tell us things like:

“what town did they live in in the past, and when did they live there, and things like that that are really going to revolutionize, I think, the way we think about DNA.”

In response to a question from the panel leader about the computational and analytical challenges to autosomal DNA products, Dr. Chahine noted that he has been building a team of computational biologists knowledgeable about DNA that have been creating and refining algorithms to analyze the date and present it in meaningful ways to users.

The panelists were also asked what would be one of the biggest changes to genealogy over the next 10 years.  Dr. Chahine offered the following:

“We’re also going to integrate DNA into records in a way that people may not think is immediately obvious, but the DNA is also going to help pick out who the right John Doe that you’re looking for in the future, and we’re working on things like that.”

Hearing from Dr. Chahine was extremely interesting, educational, and entertaining.

Why Autosomal DNA Testing?

It is clear that Ancestry.com is spending considerable amounts of time and money into their new autosomal DNA offerings.  Why would Ancestry.com spend so much time and money getting into the autosomal DNA business?  There are at least several important reasons, not the least of which is access to an enormous genealogy-minded consumer database (~1.7 million current subscribers to Ancestry.com, I believe).

However, perhaps the single most important reason for Ancestry.com to get into the autosomal DNA business is their almost-unrivaled ability to combine the results of DNA testing with an enormous database of traditional records.  Combining the results of autosomal DNA with family trees and paper records is, of course, the future of genetic genealogy.  Ancestry.com users have already been combining paper records with their family trees.  I, for example, have digitally connected numerous census and other records to individuals within my uploaded family tree.  In 2012 we will be able to add autosomal DNA as yet another layer to our family trees. For example, if John Doe and I both have family trees uploaded to Ancestry.com, and our testing reveals that we have shared DNA, we can connect that shared DNA to our shared ancestors.

In the not-so-distant future, once we have this massive combination of trees, records, and DNA, we might even be able to ask very advanced questions that we can currently only dream of:

  • What DNA/genes found today traveled to North America on the Mayflower?
  • Given my known family tree and my autosomal test results, from what ancestral individual in the Ancestry.com database might I have inherited this portion of DNA?
  • Based on the shared DNA of his ancestors, please recreate the genome my great-great-great-great-great grandfather John Doe.

It is important to understand that while the amount of both information and computing power necessary for these types of questions is enormous, it will likely be within the ability of the field over the next 5-20 years.

Are there any [currently outrageous] questions you can only dream of asking today but think might be answerable in the future using DNA?

A Preview?

In anticipation of the NBC series Who Do You Think You are, Ancestry.com released several video promos.  One of these promos (HERE) includes video at 1:02 of one of the celebrities reviewing what appears to be an ethnicity analysis (entitled “Genetic Ethnicity”) of his autosomal DNA, as well as the identification of a distant cousin (thanks to Cece Moore for pointing to the video (who in turn thanks Shannon!)).  The interface states that “Ancestry.com’s DNA analysis looks at your recent ethnicity, going back about 10 generations.”

According to the interface shown in the video, which is likely to be an early version, the test breaks down biogeographical ancestry not only into broad continental categories such as “European” and “African,” but also into regions within those categories.  For example, the results shown in the video are 74% African and 20% European.  Under the “African” tab, the results show 27% Bamoun, 22% Brong, 13% Yoruba, and 12% Igbo (a total of 74%!).

The interface also shows the locations of these groups superimposed on a map of Africa, as well as nodes which appear to represent connections (possibly genetic cousins) in those populations.  Clicking on a node, for example, brings up what appears to be a genetic cousin and shows the predicted relationship (here, a 10th cousin), various biographical information (including date of birth), a link to view the individual’s tree, and a contact link.

For More Information

Cece Moore at Your Genetic Genealogist also has a great series of posts about Ancestry.com’s new Autosomal DNA product:

Be sure to following The Genetic Genealogist, and I’ll be sure to share the latest information about Ancestry.com’s Autosomal DNA product with you.

Does DNA Link 1991 Killing to Colonial-Era Family?

The genetic genealogy world is abuzz following a recent report in news outlets around the world (including CNN, Seattle PI, Daily Mail, etc) that investigators have used public genetic genealogy DNA databases for leads in a 20-year-old cold case.

The Case

In December 1991, 16-year-old Sarah Yarborough was tragically murdered in Federal Way, Washington.  Despite an extensive investigation, no suspect has ever been named.  Investigators have sketches of a man they believe might have been involved, but there is no name to put to the pictures.

Investigators did find some important evidence however: DNA left at the scene, possibly by Yarborough’s attacker.

The DNA

Late last year, investigators gave the DNA profile (apparently the Y-DNA profile) to California-based forensic consultant Colleen Fitzpatrick (who I’ve written about before here on TGG).  Fitzpatrick, it appears, compared the Y-DNA profile to publicly-available Y-DNA databases, such as Ysearch, in an attempt to identify a potential match for the profile.  After identifying potential matches, Fitzpatrick could then potentially identify the surname of the Y-DNA’s donor.  For example, if all Bettingers have a particular Y-DNA profile and a sample Y-DNA profile closely matches that particular Y-DNA profile, then it is likely that the parties are either closely or distantly related (on a scale of 10s or 1000s of years), and they could potentially have the same surname.

Therefore, by comparing an unknown’s Y-DNA profile to public databases, it is possible to find matches and potentially identify a surname for the owner of that Y-DNA (but see “The Caveats,” below).

The Search

Fitzpatrick’s research determined that the suspect’s Y-DNA profile appears to match the Y-DNA profiles of individuals with the surname “Fuller.”  Although unclear without more information, it further appears that the suspect’s Y-DNA profile specifically matches the Y-DNA profiles of purported descendants of Robert Fuller, who settled in Salem, Mass. in 1630.

Accordingly, Fitzpatrick’s research has merely suggested that the suspect MIGHT have the surname Fuller.  Nothing more, nothing less.  It is merely a lead, something that investigators will have to devote countless hours to following up on.  The lead has not provided investigators with a magical solution to their mystery, and following this discovery they are likely not all that much closer to identifying a suspect that they were before.

The Caveats

It is important to note that there are some serious caveats to this process.  Just because an unknown Y-DNA profile matches a group of surnames in a database does not automatically mean that the unknown Y-DNA donor had the same surname.  Non-paternal events such as infidelity, adoption, name change, and others can – and have – resulted in surnames being jumbled throughout history.  Thus, simply matching the unique Bettinger profile does not mean that your last name might be Bettinger; it could be Samuels as a result of great-grandpa’s roving eye, Smith as a result of your step-great-great-grandmother’s love for orphans, or Johnson because your father was tired of people spelling “Bettinger” wrong.  For all these reasons surnames have changed over time.

It is even more vital to note, however, that Fitzpatrick’s research process is absolutely neither a new nor a groundbreaking technique! It is a familiar technique that has been done MANY times before, and continues to be done.  People – including non-genealogists – have used public databases to attempt to identify their surname and/or family.  Indeed, Family Tree DNA itself has noted that male adoptees have a 30-40% chance of identifying a likely surname by comparing their Y-DNA profile to FTDNA’s database (see here: “During the introduction Max [Blankfeld] stated that 30%-40% of male adoptees find their likely surname in FTDNA’s database”).

The Concern

Some, including both experienced genetic genealogists and people who have never had a DNA test, have expressed concern that their DNA was or could be used for this purpose, a purpose that it “wasn’t intended to be used for.”  Some have stated that the search constituted an “illegal seizure” of their property, or that their DNA should not be used by “big brother.”

Further, as the ISOGG mailing list for project adminstrators has demonstrated, many project administrators are concerned that this hullabaloo will scare away potential test-takers.

The Past

Despite the concerns of the public, genetic genealogists, and project administrators, Fitzpatrick’s process is neither a new technique nor a frightening one.  It has been done before.  Further, Fitzpatrick’s process is simply a new twist on an old method.  How is Fitzpatrick’s DNA search different, for example, from any of the following (and please don’t throw any genetic exceptionalism arguments my way!):

  • Using a public reverse-phone lookup to identify the owner of a phone number?  I didn’t authorize my phone number for that use;
  • Searching through a public phone book to identify all the Bettingers in New York state? I didn’t authorize my phone book listing for that purpose;
  • Using the census to identify my ancestors? I guarantee that NONE of my ancestors authorized the use of the census for genealogical research (indeed, just think of ALL the secrets that have been revealed in the census that our ancestors would have wanted buried forever!).

Interestingly, genealogists happen to be the biggest offenders of using public databases for purposes other than the one they were intended.

My Thoughts

One of the most interesting points to me is where some genealogists have decided to draw their line in the sand.  Comparing a person’s Y-DNA profile to public databases is fine if the person is an adoptee searching for his last name, but not if the person is a criminal that investigators need to identify.

I also believe that project administrators are overly concerned.  These types of stories come and go, and this one will fade away just as all the others have.  We are (I sincerely hope) heading into an era of genetic openness, not one of genetic fear.

Lastly, the answer to this dilemma is, as always, education.  We have to educate the public and potential test-takers that if they decide to make their Y-DNA public, it will be public for any purpose any person sees fit.  They should understand this when they send in their cheek swab.  The danger to test-takers, however, is almost nil; a public Y-DNA profile is either incomprehensible or useless for 99.99% of the world.  And keep in mind that if a criminal is identified using this method, it is the criminal activity that endangered him, NOT the public Y-DNA databases!

Your Comments

What I’m really looking for here is a conversation about the pluses and minuses of Fitzpatrick’s method and the use of public DNA databases.  Are there valid concerns, or only concerns due to the lack of education?  Why do you believe these methods are different from non-traditional uses of other public databases such as the examples I listed above?  Why do you think people might be afraid of this use of their public DNA?  And how can we better education test-takers and the public to avoid these types of concerns?

[Note: I will immediately delete any comment that is aimed at Fitzpatrick herself.  She did not invent these search methods, and should not be held responsible for their use.  I'm looking for comments about the method, not the investigator].

23andMe Announces 80x Exome Sequencing for $999

Yesterday, at Health 2.0 in San Francisco, 23andMe announced that it will be offering sequencing of exomes with 80x coverage for $999.  At Exome 80x, 23andMe discusses their test:

Your exome is the 50 million DNA bases of your genome containing the information necessary to encode all your proteins. Informally, you can think of the exome as the DNA sequence of your genes.

Your entire genome is made up of your exome plus other DNA, consisting of three billion bases with repetitive sequences, sequences of unknown function, and DNA that does not code for proteins.

Note that the Exome 80x test is only available to current customers, and is determined on a “first come, first served” basis.  Further, test-takers will initially only receive their raw data of 50 million DNA bases at 80x coverage, but 23andMe plans to develop new tools to take advantage of exome sequencing.

The Exome?

Many non-geneticists will no doubt be wondering what the “exome” really is.  The exome is the protein-coding portion of your genome, and comprises roughly 1.5% of the total genome.

For insight into what type of information might be gleaned from exome data, Daniel MacArthur has an article entitled “Venter’s exome, and the challenge of rare variants for personal genomics” from August, 2008.  In the article, he discusses some of the findings from the analysis of J. Craig Venter’s exome.

The Genealogist’s Exome

As a genetic genealogist, I was of course interested in the ramifications of exome testing on testing for genetic ancestry purposes.  23andMe states the following on their Exome 80x page:

Exome data are less suitable for ancestry or genealogical research, since they will not provide mitochrondrial sequence or much information on the Y chromosome.

This is a strange sentence, and one I believe wasn’t properly screened.  In my experience few genealogists decide to pursue 23andMe testing for the mtDNA or Y-DNA results.  It’s autosomal DNA testing and tools like Ancestry Painting and Relative Finder for which most genealogists use 23andme testing, and it’s far too early to tell whether genealogists will be able to make use of exome sequencing (of course we will!).

I hope this sentiment does not discourage genetic genealogists from pursuing the Exome 80x product.  Genealogists have been – and continue to be – among the very first adopters of new DTC DNA testing (including 23andMe’s very first product back in the 2007 to 2009 time frame).  Indeed, genealogists having been driving the DTC genetic testing market since 2000 with the launch of Family Tree DNA!

The Possibilities

One of most exciting uses of the Exome 80x product might be in self-directed discovery of rare variants in genetic disorders.  Numerous rare genetic diseases, many of which likely result from unidentified rare variants, have not been exhaustively studied.  At least one group has estimated that 85% of disease-causing mutations are found in the exome.

I can envision 23andMe Community Projects for rare genetic disorders, similar to its Parkinson’s Community but much smaller in size, where several members of a family purchase the Exome 80x sequencing in an attempt to identify variants that might be involved in the disease.  These projects may be sponsored and supported by 23andme, or might simply be a family attempting to analyze their genomes themselves.

Other Viewpoints:

Will you be signing up for 23andMe’s Exome 80x product?

“My Beautiful Genome” by Lone Frank

Lone Frank, a journalist and author with a Ph.D. in neurobiology, has just published her fourth book, entitled “My Beautiful Genome: Exposing Our Genetic Future, One Quirk at a Time” (available for pre-order at Amazon).  A chapter of the book is available here (pdf).

Frank describes her book thusly: “This book is my very personal take on personal genomics. It chronicles my meetings and interviews with leading scientists and lays out the – somtimes [sic] disquieting – discoveries I make in my own genome.”

The book is described as follows at Amazon:

“Internationally acclaimed science writer Lone Frank swabs up her DNA to provide the first truly intimate account of the new science of consumer-led genomics. She challenges the scientists and business mavericks intent on mapping every baby’s genome, ponders the consequences of biological fortune-telling, and prods the psychologists who hope to uncover just how important our environment really is – a quest made all the more gripping as Frank considers her family’s and her own struggles with depression.”

I haven’t read the book myself, although I will soon be receiving a review copy.  Once I’ve finished it, I’ll write more about the book here at the blog. There is a recent write-up of Frank’s experiences at the Daily Mail entitled “If the blues genes fit…

I’m most interested to see what Frank finds in her genome, and how she interprets and uses her data beyond the interpretation provided by the testing companies.

Family Tree DNA’s 7th International Conference on Genetic Genealogy Announced

Family Tree DNA has announced the 7th Genetic Genealogy Conference for Family Tree DNA Group Administrators, to be held in Houston, Texas on November 5th and 6th, 2011.

Featured speakers at the meeting include the following:

Another interesting speaker at the meeting will be Jessica L. Roberts, J.D., an Assistant Professor of Law at the University of Houston Law Center (recent C.V. here (pdf)).  Although it’s not clear what Roberts will be speaking about, her recent publications (pdf) focus on genetics and the law, including the Genetic Information Nondiscrimination Act.  Kudos to Family Tree DNA for again bringing together a wide array of viewpoints and opinions at the conference.

——————————————————-

Unfortunately I will be unable to attend the conference this year, although I made it last year and hope to make it to the next conference.  I look forward to live-tweeting of the conference, which is the next best thing to being there.  Are you attending the conference?

Interpretome: New Analysis Software for Autosomal Testing Results

Daniel MacArthur tweeted this morning about “Interpretome,” which is browser-based software that can be used to examine autosomal testing results from 23andMe and Lumigenix.  There is also an interesting blog post about the software at the blog of Konrad J. Karczewski, one of the co-creators of the software, and one by Daniel at Genomes Unzipped.

Users load their raw data files, and then can use that information to explore their genome.  There are a number of different exercises that a user can run through with their data, including health issues (diabetes, warfarin sensitivity, many other diseases, etc.), ancestry analyses, and determination of “Neanderthal SNPs,” which are SNPs that have been suggested to derive from Neanderthal ancestry (note that this science is still VERY early stage and subject to change OFTEN!).

There are two very features that I find very interesting.  First, there is an “Advanced Settings” tab where users can make several important adjustments to their analysis.  Second, the site allows for “imputation” when looking up a SNP, which means that “If the SNP is not found in your file, the utility attempts to impute the SNP using Hapmap data.“  I haven’t tried this yet, but it will be interesting to see how well it works.

Ancestry Information

Interpretome allows users to create, among other things, an “Ancestry Painting” using either HapMap2 or HapMap3 data.  For example, using the HapMap2 data, my Interpretome ancestry painting is very similar to my 23andMe ancestry painting.  For those who aren’t familiar, here are the HapMap2 populations (HapMap3 can be found here):

YRI (Ibadan, Nigeria)

CEU (Northern/western Europe)

CHB+JPT (Beijing, China and Tokyo, Japan)

Medically-Relevant Information and Privacy Issues

The creators of Interpretome do address several issues, including the medical information controversy:

No information should be considered diagnostic and as with any genetic testing service, the interpretation is not regulated by the FDA.

And the important privacy issue:

Your genome will not be sent to any server, it remains on your computer. This website will make requests to a database that only contain “rsid” (without genotypes) and “population” (self-reported in the top-right) information. At no point will any genotypes be sent across the wires (all computation will be done in the browser).

However, the creators do go on to note that some exercises have the option of submitting personal information, which “will be anonymously stored on a secure server.”  So be cautious if you’re worried about privacy, as with any testing or analysis service.  As my genome is public domain, I’m not concerned.

Family Tree DNA Results?

For fun, I also tried my Family Tree DNA results.  Since FTDNA raw data results do not contain most, if any, medically-relevant SNPs, most of the “exercises” were fruitless.  I did have some luck with the ancestry sections, although I have yet to compare my 23andMe analysis with my FTNDA analysis to determine if there is consistency.

Using Autsomal DNA Testing to Identify An Adoptee’s Roots

The Mystery

Helen Marley Johnson, my great-grandmother, was born to unidentified parents on March 3, 1889, in Oswego County, New York.  Although I didn’t really know Marley, I remember meeting her when I was very, very young, just before she died in 1983.

Copyright Blaine T. BettingerMarley lived in Oswego and Jefferson counties for all her long life.  She was married twice, had two children, and today has numerous descendants located throughout the United States and the world.  However, by the time Marley was 13 years old, she had been adopted by at least three different families, eventually marrying into the last family that adopted her.

Since I began my genealogical research more than 20 years ago, I’ve worked to find the parents of Marley Johnson, without much success.  I have a plethora of data about the entire remainder of her life, but almost nothing about her ancestry.  For example, although I’ve found her birth certificate, it lists her mother as Minerva Johnson (a name that may or may not be real, and which I’ve found nothing on) and lists her father as “unknown.”

Autosomal DNA

Autosomal DNA testing presents the most promising new avenue of researching into Marley’s ancestry.  Copyright Blaine T. BettingerUnfortunately, both of Marley’s children have been dead for more than 30 years.  However, Marley has several living grandchildren, including my father and a first cousin named Edgar (name changed for privacy reasons).  By comparing autosomal results my father with his first cousin, it is possible to identify stretches of their DNA that they inherited from Marley and her husband Frank Bettinger.  Here’s why:

Both my father and Edgar are grandchildren of Marley and Frank, or children of Marley’s children. My father is the son of Marley & Frank’s son, and Edgar is the son of Marley & Frank’s daughter.  Approximately 25% of my father’s DNA comes from Marley, and approximately 25% of Edgar’s DNA comes from Marley.  Although it is not the same 25% in both cousins (because the children inherited random pieces of Marley’s DNA and then passed on random pieces of that DNA to their children), it is statistically nearly certain that they will share some of Marley’s DNA.  Indeed, first cousins are predicted to share 12.5% of their DNA, with about half each from the shared grandparents (6.25% of their shared DNA from Marley, and 6.25% of their shared DNA from Frank).  Both will have much more DNA from these ancestors, but it won’t be shared between them.

By comparing the autosomal DNA testing results of my father with Edgar, it will be possible to identify the DNA that they have in common.  Because they only share Marley and her husband Frank as ancestors (an important assumption here), then any DNA they have in common must be DNA that they inherited from Frank and Marley.

Of course, this is dependent upon Edgar and me not sharing any DNA from other ancestors, for example on my maternal side.  If we shared other ancestors, it would be much more difficult (but not impossible) to identify which DNA came from which ancestors.  However, given Edgar’s paternal ancestry – the side which does not involve Frank and Marley – this is exceedingly unlikely (but will be kept in mind during future analysis).

Results

I now have autosomal DNA results for Edgar and myself using Family Tree DNA’s Family Finder, and more specifically using their new Illumina OmniExpress chip.  The figure below highlights the regions of our genomes where we share at least 3cM stretches of DNA.

Note that I’ve used my DNA for this test, rather than my father, simply because have yet to test my father.  The numbers change slightly, as I’m predicted to share 6.25% of my DNA with Edgar, my first cousin once removed.  We share about 333 cMs (268 million base pairs), which I’ve calculated to be about 4.4% of our genomes (please chime in if you think this estimate is incorrect, as I haven’t had sufficient time to explore it).

With this map and the data that comes from it, I’ve identified portions of my genome (and Edgar’s) that come from Marley and Frank.  Although I don’t know which portions came from who, I have a wealth of information I can now use to explore our shared ancestry.

Now What?

So now what?  Now, I wait for matches shared by Edgar and I, people who share one or more of these stretches of DNA.  Currently, we do not share any individuals.  If another individual shares a piece of the identified DNA, it is likely that they are related through Frank and Marley.  As I have a great deal of information about Frank’s ancestry, I can try to narrow down the matches to Marley’s ancestry.  This, of course, presents one of the biggest challenges of this approach.

Further, identifying relatives is only the very first – and the easiest – step.  Once I have identified someone who might be Marley’s biological relative, I have to obtain as much of their genealogical tree as they are willing to share in order to mine it for information.  I will be looking for families that lived in or migrated through the Upstate New York area in the early 1880′s.  Of course, I must consider all the descendants of any potential relatives as well.

Yes, it’s a great deal of work, and there is no guarantee that I will ever identify a link.  For example, what if John Doe, Marley’s father, took an undocumented vacation in Upstate New York to visit his best friend and had a fling with Marley’s mother?  I may not be able to uncover that connection either in paper records or in DNA, at least for now.

My best bet is to accumulate as much information as possible – paper records, DNA, gedcoms, family trees, etc. – and slowly create a web of paper and DNA.  This web will undoubtedly slowly reveal overlapping information that hints at Marley’s ancestry.  For example, there may only be one potential male individual who possesses DNA from family X, DNA from family Y, and DNA from family Z, all of which Marley inherited and of which Edgar and I share.  A needle in a haystack, but an exciting possibility nonetheless.

The Future

In the future, I can attempt to mine existing genomes for more data.  For example, by comparing my father’s siblings with Edgar’s DNA.  Statistically, they will share different portions of their genome with Edgar, allowing me to more completely identify the DNA in Edgar’s genome that came from Frank and Marley.  Since Edgar is the extent of the other line, and Marley’s children are dead, this is the best I can currently do (until I can sequence Marley’s DNA directly from the stamps and letters she licked and I’ve saved).

Conclusion

Essentially, using autosomal DNA testing and the approach described above, I have re-created portions of my great-grandparent’s genomes by identifying bits and pieces of their DNA in living individuals. What an exciting time to be a genealogist.

Now let me know, do you have any tips or suggestions for me as I continue my hunt for Marley’s parents?  If so, please share them below.