There has been much discussion (see here and here for a few examples) of the so-called “Scandinavian Problem” with AncestryDNA‘s ethnicity estimate, in which certain populations appeared to be over-represented in the reference panel utilized by Ancestry.com. I, for example, have no documented Scandinavian ancestry, but had 78% Scandinavian. Many others experienced the same issue.
The AncestryDNA team were well aware of the issues, and have been working on an update to their ethnicity algorithm, reference panel, and user interface. Indeed, at “The First DNA Day at the Southern California Genealogy Society Jamboree” in June of this year, Ken Chahine (Senior Vice President and General Manager, DNA) gave a presentation in which he announced that the ethnicity calculations at AncestryDNA were undergoing a complete overhaul and a major update would be provided to all customers later this year.
A Limited Launch Today
Today, Ancestry.com announced on its blog (see “A Sneak Peek Into The AncestryDNA Ethnicity Update – Coming Soon To Your DNA Results!“) that as of today they “had launched a preview of the new features and results to a small random group of AncestryDNA members, which will be released to everyone in the next few months.”
According to the AncestryDNA team, about 6,000 people received the updated estimate today. The New Ethnicity Estimate is free, and does not require any additional testing by the customer. The remainder of AncestryDNA customers should receive their updated ethnicity estimates anywhere from 1 to 3 months, based on what I read and heard today.
Also today, several members of the genealogy blogging community attended a webinar in which the AncestryDNA team presented the updated ethnicity estimate interface and the science behind the update, and answered our questions. As I’m sure other members of the community will agree, Ancestry.com’s transparency is greatly appreciated and benefits us all.
The Quick Summary:
Here are some of the highlights, if you’re in a hurry:
- 6,000 AncestryDNA customers received new Ethnicity Estimates today, the rest will in approximately 1-2 months;
- There are now 26 reference populations that your DNA is compared to;
- The new Ethnicity Estimate appears to detect small (as small as <1%) percentages of genetic ethnicity;
- Each estimate is the average of 40 different analyses using an algorithm called ADMIXTURE; and
- Estimates are provided with a likely range.
My AncestryDNA Ethnicity Estimate
For the sake of comparison, here is my previous estimate from AncestryDNA (note the NEW Ethnicity Estimate Preview button – this is how you’ll know that you’ve randomly been selected for the new estimate):
And here is my new ethnicity estimate:
As you can see, my estimate is VERY different. My report now reveals the trace African and Native American ancestry that comes from my Central American ancestry, and which is reported in my 23andMe report (click here to compare). I also now have significant Great Britain and Ireland contributions, which I had expected to see based on my genealogical tree.
Clicking on one of the population names, for example “Great Britain,” changes the interface to concentrate on that information:
The map on the right side of the page shows the region that this category encompasses; it is primarily the U.K., but not surprisingly it can include the surrounding areas of France, Germany, Denmark, and Ireland. Considering the recent and extensive admixture of these populations, it isn’t surprising that it can be challenging to genetically distinguish between these populations.
Below each map is a much larger section providing more information, including a comparison to the “typical” person from the selected region.
Another interesting bit of information is the other regions commonly seen for each selected estimate. Below are the regions commonly seen in people native to Great Britain:
As another example, here is my small Native American ethnicity:
Because there are few Native American reference samples, it is currently extremely difficult and noisy to try to associate specific DNA with any region within North and South America.
The Ethnicity Estimate Process
If you look at the screenshots above, you’ll see for example that my Great Britain estimate is 55%, with a range of 25% to 85%. To fully understand what this means, it’s helpful to understand how AncestryDNA calculates these numbers.
Your sequenced DNA is run through an algorithm called ADMIXTURE, which estimates the proportions of “membership” in a set of ancestral clusters, or populations. However, this analysis is done 40 different times, and each time a random 95% of your processed raw data is used for the analysis. After the ADMIXTURE analysis is performed a total of 40 times, the result reported to you is the mean (the average) of those 40 times.
The range provided (e.g., the 25% to 85% Great Britain in my example above) is the range obtained by the ADMIXTURE analysis within 2 standard deviations. Therefore, on one run of ADMIXTURE I had 25% Great Britain, and on another I had 85% Great Britain. But among the 40 different analyses, the average was 55%.
Populations that are “noisy” – i.e., are not clearly genetically distinguishable from all other populations – will have wider ranges, while less noisy populations may have narrower ranges. For example, my Native American estimate has an extremely narrow range in part because those reference samples may not be as noisy.
In the prior ethnicity estimate, AncestryDNA was primarily utilizing data gathered by Sorenson Molecular Genealogy Foundation (“SMGF”).
For the update, AncestryDNA obtained – and used their chip to analyze – DNA from approximately 3,000 individuals around the world with well-established pedigrees from their location. For example, samples were obtained from people in France who knew, to the best of their knowledge, that their ancestry was from France as far back as they could examine. Here is an example of locations within Europe from which new samples were obtained:
As a result of this extensive new sampling, the AncestryDNA test is now comparing customers’ DNA to 26 reference populations (see below for a list).
Great Graphics and Information
In addition to the customer- and population-specific information discussed above, AncestryDNA also provides some significant new behind-the-scenes information about how the analysis is performed, including the limitations of the calculations. Clicking on the “help” button below, with the red arrow in this figure, leads to a pop-up with more information:
Each one of the six links in the pop-up lead to screens with additional information.
Here are the 26 populations for which AncestryDNA has samples and to which it is comparing customer’s DNA:
- Africa North
- Africa Southeastern Bantu
- Africa Southcentral Hunter Gatherers
- Native American
- Asia South
- Asia East
- Asia Central
- Great Britain
- Europe West
- Iberian Peninsula
- Europe East
- Finnish/Northern Russia
- European Jewish
- Near East