An In-Depth Analysis of the Use of Small Segments as Genealogical Evidence

TL:DR Executive Summary:

  1. The size of a “small segment” is not strictly defined but is generally less than 10 cM and preferably less than 7 cM;
  2. A very large percentage of small segments are not valid matching segments (i.e., are false segments), and our ability to distinguish valid segments from false segments is either impossible or very limited;
  3. Although we cannot reliability utilize a small shared segment as genealogical evidence in most cases, that does not negate the potential value of a genealogical relationship;
  4. There is no evidence that: (1) triangulating a small segment; or (2) sharing a large segment in addition to the small segment; or (3) finding a shared ancestor, increases the probability that a small segment is valid;
  5. Even if a small segment is somehow identified as valid, perhaps an even bigger challenge is determining whether the small segment (which can be 10, 20 or more generations old) came from the identified ancestral line or another known or unknown shared ancestral line.

Small segments have long been a controversial subject in the genealogy community. Some love them, some hate them. Here we will look at all the available evidence surrounding small segments, some of the misconceptions associated with them, and some of the ways we might be able to utilize small segments in our research.

What are “Small Segments”?

One of the biggest problems surrounding the discussion of small segments it that there is no strict definition of the term. A “small segment” to one person might be anything 5 cM or less, while a “small segment” to another person might be anything less than 10 cM. And there are many other variations. For purposes of this analysis, we will define a small segment as any single segment of DNA less than 8 cM (although segments as high as 10 cM or more can be problematic as well). ... Click to read more!

Sharing Large Segments With a Match Does Not Validate Small Segments Shared With That Match

OK, that could be one of the worst blog titles I’ve written, but it’s intentional. When people share this post, I want the title to clearly convey the lesson.

Small Segments are Poison

We know that many small segments are false, and thus that many distant matches are false positives. I have written about small segments and distant matches many times. For a few background articles, see the following:

The (most current as of September 2017) definitive article on the nature of false versus true small segments is “Reducing Pervasive False-Positive Identical-by-Descent Segments Detected by Large-Scale Pedigree Analysis.” The paper is available online for free (http://mbe.oxfordjournals.org/content/31/8/2212). In the paper, the researchers found that more than 67% of all reported segments shorter than 4 cM are false-positive segments. At least 60% of 4cM segments were false-positive, and at least 33% of 5 cM segments were false-positive. The number of false-positives decreased fairly rapidly above 5 cM. See my analysis of this paper here. ... Click to read more!

How Do DNA Segments Get Smaller?

Many genetic genealogists, myself included, often talk about DNA segments getting “broken up” or “broken down” as they are passed from one generation to the next. But this language can be misleading, since DNA isn’t really “broken up” into pieces when it passed down; instead, a few pieces are traded between nonsister chromosomes in a process called RECOMBINATION.

Genetic recombination is a process of crossover between chromosomes during MEIOSIS (meiosis = a very specialized cell division that creates eggs and sperm for reproduction). Very early in meiosis, the cells duplicate the chromosomes. Normally, every cell has 23 pairs of chromosomes, for a total of 46 chromosomes. However, in the first step of meiosis, the chromosomes are duplicated to result in a total of 92 chromosomes. There are 4 copies of chromosome 1 (2 copies of the chromosome you got from your mother, and 2 copies of the chromosome you got from your father). There are 4 copies of chromosome 2, and so on. ... Click to read more!

Small Matching Segments – Examining Hypotheses

Last week I published “Small Matching Segments – Friend or Foe?” to join in the community’s conversation about the use of “small” segments of DNA, referring to segments 5 cM and smaller (although keep in mind that the term “small,” without a more specific definition, will mean different things to different people).

The question that the community has been struggling with is whether small segments of DNA can be used as genealogical evidence, and if so, how they can be used.

As I wrote in my post, a significant percentage of small segments are false positives, with the number at least 33% and likely much higher. In my examination and in the Durand paper I discuss, a false positive is defined as a small segment that is not shared between a child and at least one of the parents. ... Click to read more!

Small Matching Segments – Friend or Foe?

There has been a great deal of conversation in the genetic genealogy community over the past couple of weeks about the use of “small” segments of matching DNA. Typically, the term “small” refers to segments of 5 cM and smaller, although some people include segments of 7 cM or even 10 cM and smaller in the definition.

The question, essentially, is whether small segments of DNA can be used as genealogical evidence, and if so, how they can be used.

While it may seem at first that all shared segments of DNA could constitute genealogical evidence, unfortunately some small segments are IBS, creating “false positive” matches for reasons other than recent ancestry. These segments sometimes match because of lack of phasing, phasing errors, or a variety of other reasons. One thing, however, is clear: there is no debate in the genetic genealogy community that many small segments are false positive matches. There IS debate, however, regarding the rate of false positive matches, and what that means for the use of small segments as genealogical evidence. ... Click to read more!

A Small Segment Round-Up

If you aren’t already a member of the coolest Facebook group ever, Genetic Genealogy Tips & Techniques, you really should be! We have a friendly and engaging environment, and everyone learns something new every day!

This post is meant to answer a question or issue that is raised almost daily in the group, and that is the issue of small shared DNA segments. Although these small segments are alluring, they are the mythological sirens of the genealogical world!

Small Segments Executive Summary

Here’s a bite-sized summary of the content below:

  • Many to most small segments (at least 7 cM and smaller) are FALSE, meaning they are NOT actually shared by the two matches, and therefore do NOT indicate shared ancestry;
  • This is supported by a 2014 paper by 23andMe scientists showing that at least 33% of 5 cM phased DNA segments are false-positive (and it’s much worse for unphased segments or segments smaller than 5 cM);
  • This is further supported by evidence that anywhere from 20-35% of distant matches at a testing company are not shared with either tested parent;
  • This is further supported by evidence that phasing your DNA with two tested parents significantly reduces the number of matches below 10 cM (with proportionally more matches reduced as the segment size gets smaller);
  • There is currently no evidence that triangulating segments or finding a paper trail provides a mechanism for distinguishing between false segments and valid segments;
  • Since we can’t tell the difference between false small segments and valid small segments, we must avoid these small segments to avoid poisoning our genealogical conclusions with false data; and
  • Beware any research or conclusion that uses these small segments without specifically addressing the issues that are known – based on all the scientific research and evidence gathered to date – to surround small segments.

If you’re interested in learning more, keep reading!

Small Segments In Detail

One of the most common questions in the group has to do with small segments. There’s no exact definition of “small” when it comes to small segments, but many of us define them as being a single segment of DNA of 7 cM or smaller. Others use 5 cM or smaller, while others use 10 cM or smaller. Personally, I consider segments of 7 cM or less to be “small,” although when I’m being very conservative I use a definition of 10 cM or smaller. ... Click to read more!

How Many Segments Do You Share?

I have told people in the past that we share a single segment of meaning IBD DNA with the vast majority of our genetic matches (where IBD means Identity-by-Descent, or a valid matching segment of DNA from a recent genealogical relationship). I usually say that we share a single segment of DNA with 99% of our matches, but that’s been an off-the-cuff estimate. I wanted to have better data to cite, so I took a closer look at this issue.

At FTDNA, you can download a list of all of your matches:

I downloaded my list and removed all of my targeted test-takers (anyone that I tested or I asked to test). These close test-takers would skew the data.

After removing them from my match list, I have a total of 2,491 matches at Family Tree DNA.

Family Tree DNA also allows you to download a list of all the segments you share with your matches: ... Click to read more!

Losing Distant Matches at AncestryDNA

During a phone call with AncestryDNA representatives this week (unfortunately I was not able to attend), numerous genealogists heard two major announcements:

  1. The AncestryDNA database has hit 18 million test takers (such great news!); and
  2. There are significant changes coming to our DNA match list.

The announcement started to appear on the DNA match list page yesterday:

Clicking on the link brings up information about the changes:

The changes to the DNA match list comprise the following:

  1. The number of shared segments should improve

From the announcement: “The DNA you share with a match is distributed across segments – short segments, long segments, or some combination of both. Our updated matching algorithm may reduce the estimated number of segments you share with some of your DNA matches. This doesn’t change the estimated total amount of shared DNA (measured in centimorgans/cM) or the predicted relationship to your matches.” ... Click to read more!

Examining Outliers in Shared cM Amounts – Part 2

In this blog post we will briefly review an extreme Grandparent/Grandchild relationship, where a grandchild appears to share just 9% of her DNA with a paternal grandmother rather than the expected 25%. All information is anonymized.

I’m a little afraid to post this article about an extreme outlier scenario. There is a danger that it could support misinterpretation rather than foster critical thinking. If you have a possible outlier scenario, be sure to try to disprove that it is an outlier situation, rather than simply proceeding as if is an outlier. Avoid confirmation bias!

This is the third post on my blog specifically examining outliers in confirmed relationships:

  1. Analyzing a Lack of Sharing in 2C1R Relationship
  2. Examining Outliers in Shared cM Amounts

Is it an Outlier? The Extreme Danger of Confirmation Bias

This was discussed in a previous article about outliers, but it bears repeating. ... Click to read more!

TGG’s Top Posts in 2017

I started The Genetic Genealogist on February 12, 2007 with my first post, “New estimates for the arrival of the earliest Native Americans.” There were few educational resources for genetic genealogy back then, and all testing was Y-DNA and mtDNA. Although 23andMe would launch the first large-scale atDNA test a few months later in November of 2007 (see “23andMe Launches Their Personal Genome Service” announcing the $1,000 test), it would be a couple of years until they used the results for cousin matching. Today, almost 11 years later, there are 617 posts with more than 310,000 words.

Here’s a screenshot from the blog in December 2007:

This year I posted about 30 times about a wide variety of topics. Here are the most popular posts in 2017: ... Click to read more!