Your ‘Ethnicity Estimate’ Doesn’t Mean What You Think It Does

Since a given region still contains some degree of genetic variation, it’s possible for a reference group to miss some of that diversity. To use an analogy, if you selected 23 New Yorkers that all happened to all live in Little Guyana and made them a reference group for all New Yorkers, you might not get a representative sample of the city. Haplotypes common amongst Guyanese people would probably be overrepresented.

Jeff says that anything more granular than continent-level estimates involves some big-time guesswork. “We’re making a huge assumption that this variant is the only variant, and that these populations are somewhat of a monolith,” she says. “We really do need more information to dig down to more detailed population differences within these continents.”

If Ancestry doesn’t have a reference population that matches your specific ancestry, the algorithm will assign you the next closest region. There’s no reference group for Denmark, for instance, so people with Danish ancestry “tend to get somewhere around a quarter Germany, Norway, Sweden, and England,” says Starr. Lacking specificity, the algorithm is searching for haplotypes most similar to those found amongst Danes—but the result can be misleading. “You wouldn’t want them to think, ‘Oh, I have one grandparent from [each country],” says Starr.

Countries like Denmark—and all countries to some degree—pose a challenge because of what’s called admixing, which is basically a jargony word for mixing. Human history is one of migration, of invasion, of populations intermingling. That makes it tough to distinguish certain regions from one another, especially neighboring ones. Germanic tribes and Scandanavian Vikings both settled in the British Isles, for instance, meaning a person from modern-day England might have DNA from all of those regions.

And of course, nations are human inventions, their borders cropping up and shifting over time. What we call “France” has ballooned and shrunk over the centuries, overlapping at times with modern-day northern Italy. “In our previous update, a lot of people in Northern Italy were getting France,” says Starr. “If you look at history, it makes sense because that part of the world was not very distinct. But in this update, we were able to split Italy into North and South. People from Northern Italy got Italy back, so there’s lots more Northern Italy than France now.”

That’s also the reason all those people suddenly became more Scottish. The update separated what had previously been two regions in the Ancestry database—England/Wales/Northwestern Europe and Ireland/Scotland—into four: England, Ireland, Scotland, and Wales. Before the change, “Scottish people typically got a lot of both Ireland & Scotland and England, Wales & Northwestern Europe in their results—often almost a 50/50 split,” a post on the company’s website explained. “Since Scotland appeared in only one of the names, some people wondered what had happened to their Scottish ancestry. It was there all the time, but ‘hidden’ under another name.”

In a white paper posted to the company’s website in September, Ancestry scientists issued a self-report on their accuracy: They gave themselves a B. Using a sampling of reference panel members, whose ancestries they already knew, they ran their DNA through their algorithm to see if it would assign each person to the correct region. They found their algorithm to be correct 84.2 percent of the time on average, but for identifying certain groups, such as indigenous Cuban people, their accuracy rate sunk as low as 32 percent.

Access to indigenous people’s DNA is ethically fraught, making it tricky to come by—for reasons such as difficulty obtaining informed consent, concerns about exploiting indigenous people for profit, perceptions that scientists are more interested in preserving endangered tribes’ DNA than their members, and worries that the test results could be used as tools of continuing oppression, for example, to deny people land rights. As a result, the DNA of indigenous people is often underrepresented in genetic databases, leading to results that can be misinterpreted. “For example, when Elizabeth Warren said that she had Native ancestry, what she was actually referring to was Latinx and South American reference populations and calling that indigenous American,” says Jeff. Ancestry gets around this by using DNA from admixed populations and identifying the segments that correspond to indigenous groups. They use only that portion in their reference panel, meaning they don’t need people with long family histories in a single region.