Sunday, December 28, 2014
Not long ago Lazaridis et al. proposed that most present-day Europeans were derived from three distinct ancestral populations: Ancient North Eurasians (ANE), Early European Farmers (EEF) and Western European Hunter-Gatherers (WHG).
However, this is essentially a stop-gap model, which will in all likelihood be replaced by a partly revised and more robust model once someone manages to sequence a genome or two from the Neolithic Near East. That's because EEF is clearly a hybrid component, largely made up of ancient Near Eastern ancestry and something very WHG-like, sometimes in very different proportions depending on the location and archeological context of the EEF genomes being analyzed.
So what will this new model look like, you might ask? Probably like this, where EEF is replaced by an Early Neolithic Farmer (ENF) component from the ancient Near East, or something very similar:
The diagram above is basically a Principal Component Analysis (PCA) based on output from my new West Eurasia K8 test (see here), in which the Near Eastern component is synonymous with ENF.
I'm quite certain that these results are very close to the truth. However, just in case the Near Eastern ancestry proportions are a little bit too high (and we won't know until we see those ancient genomes from the Near East), I've got another version that offers lower bound Near Eastern estimates.
It might be useful to keep in mind that I rotated the plots to fit geography. As a result, Component 1, which packs around 85% of the variance on both plots, appears smaller than Component 2, which only carries around 10% of the variance.
A spreadsheet with West Eurasia K8 results for a wide variety of populations is available here. Please note that there are two sheets, with the second sheet showing the lower bound Near Eastern ancestry proportions.
We'll probably learn of more ancient European meta-populations as many more genomes are sequenced from across Eurasia. Nevertheless, I doubt this will affect the model outlined above. That's because I'm expecting all such meta-populations to be mixtures of ANE, ENF and/or WHG, as well as, in some cases, extra-West Eurasian components.
However, I suspect that West Eurasia will have to be modeled in a different way from Europe, with, amongst other things, the so called Basal Eurasian component replacing ENF. But for this to happen we'll need at least one ancient genome that is in large-part of Basal Eurasian origin. In any case, that's a whole different subject.
ANE is the primary cause of west to east genetic differentiation across West Eurasia
Gokhem2 + Motala12 =/= present-day Swedes
Sunday, November 30, 2014
Monday, September 8, 2014
Update 01/01/2015: Crowdfunding for 2015 + new K8 test. See here.
As its name implies, the Eurogenes ANE K7 is specifically designed to estimate Ancient North Eurasian (ANE) ancestry. It's based on a series of supervised runs with the ADMIXTURE software, and freely available at GEDmatch under the Eurogenes Ad-mix tests tab.
The ANE component is not modeled on the Mal'ta boy or MA-1 genome, the main ANE proxy in scientific literature, because this sample didn't offer enough high quality markers for the job. So instead, I used the non-East Asian portions of several Karitiana genomes from the HGDP.
I wasn't sure what was going to come of that, but it actually seems to have worked out really well. Below are the results for several individuals that were not used in the making of the test, and clearly their ANE scores look pretty damn solid going by recent papers. For instance, both Lazaridis et al. and Raghavan et al. estimate the Karitiana Indians at just over 41% ANE (see here and here).
Karitiana_HGDP00998You can also cross-check your ANE score with the results in this spreadsheet and table. The spreadsheet includes ANE estimates for more than 2,000 individuals that I tested with the ADMIXTURE software in supervised mode (see here).
On the other hand, the table comes from the Lazaridis et al. preprint, which I'm sure many of you have read by now several times over. And please pay attention to the range of ANE proportions for each population, rather than just the point estimates.
Obviously, there are also six other ancestral components in this test (hence the K7 in the name). They're basically byproducts of me trying to isolate ANE, and don't necessarily mean anything. Nevertheless, here's a brief rundown of what I think some of them might represent...
Ancestral South Eurasian (ASE): this is a really basal cluster that peaks in tribal groups of Southeast Asia. It's probably very similar in some ways to the Ancestral South Indian (ASI) component described by Reich et al. a few years ago.The other three components should be easy to work out from their names. They're almost identical to several components with the same or similar names from my other tests.
Western European/Unknown Hunter-Gatherer (WHG-UHG): this essentially looks like a West Eurasian forager component, and includes the forager-like stuff carried by Neolithic farmers (Oetzi the Iceman has 40% of it).
Early Neolithic Farmer (ENF): I'd say that this is the component of the earliest Neolithic farmers from the Fertile Crescent.
Some of you might be wondering why this test doesn't offer an Early European Farmer (EEF) cluster. But the answer to that should be obvious by now. EEF is not a stable ancestral component. It's actually a composite of at least two ancient components, including the so called Basal Eurasian and WHG-UHG. If it really was a genuine ancestral component, like ANE, then I'm pretty sure I'd be able catch it with ADMIXTURE. But I can't.
Indeed, a really important thing to understand about the Lazaridis et al. study is that it doesn't actually attempt to estimate overall WHG-UHG ancestry in Europeans, but rather the excess WHG-UHG on top of what is already present in the EEF proxy Stuttgart.
Also worth noting is that this K7 can be a bit noisy. That's mainly because it's very difficult to correctly assign proportions of ancient ancestry to present-day samples. But like I say above, this test is basically designed to estimate ANE scores. If you're wanting to learn about your overall ancestry then I recommend the Eurogenes K13 and K15 tests.
Missing SNPs might also be an issue for some people. It stands to reason that results will be noisier with more missing markers and no calls.
Have fun and don't forget to make a donation at some point to the Eurogenes cause, via the PayPal tab at the top right of the page. This will help me to keep up with what's going on in the world of Paleogenomics, and continue blogging and running analyses.
Iosif Lazaridis, Nick Patterson, Alissa Mittnik, et al., Ancient human genomes suggest three ancestral populations for present-day Europeans, arXiv, April 2, 2014, arXiv:1312.6639v2
Raghavan et al., Upper Palaeolithic Siberian genome reveals dual ancestry of Native Americans, Nature, (2013), Published online 20 November 2013, doi:10.1038/nature12736
Corded Ware Culture linked to the spread of ANE across Europe
Wednesday, July 16, 2014
This is really easy and should work well for most personal genomics customers (ie. those of European ancestry and with data files from 23andMe, FTDNA and AncestryDNA).
First of all, make sure you have your Eurogenes K15 ancestry proportions from GEDmatch. Then do the following:
- download the 4 Ancestors Oracle (here)
- download the Eurogenes ancient genomes datasheet (here)
- place everything into the same directory
- double click of the 4 Ancestors Oracle icon (the big red number 4)
- select the Eurogenes K15 ancient genomes datasheet
- type your Eurogenes K15 ancestry proportions into the fields provided
- hit the go button and let it rip
I'm not sure I'm allowed to upload the 4 Ancestors Oracle online, but I couldn't find the original link, so let's assume for the time being that I am. In any case, many thanks to Alexandr Burnashev for this great tool.
You'll also find some modern populations in the datasheet. They're there so that users with ancestry from outside of Europe don't end up with ridiculous results.
Obviously, you can edit the datasheet to explore more options by removing or adding individuals and populations. A spreadsheet of Eurogenes K15 population averages is available here. The oracle settings can also be tweaked in a couple of ways to fine tune the results.
If the calculator crashes, try replacing the periods with commas in both the datasheet and your ancestry proportions.
Please keep checking this post, because I'll attempt to update the datasheet at the link above every time a new ancient genome is published and has enough markers available to be tested with the Eurogenes K15. Eventually we might end up with a tool that covers most of the continents and many periods of history and prehistory.
I've done similar analyses of a variety of ancient genomes. For instance, StoraFörvar11, or SfF11, from Mesolithic Sweden came out 3/4 La Brana-1 and 1/4 MA-1, which translates to 3/4 Western European Hunter-Gatherer (WHG) and 1/4 Ancient North Eurasian (ANE), and lines up well with results reported recently for Swedish hunter-gatherers in scientific literature. You can see the full analysis StoraFörvar11 and a couple of other ancient genomes at the links below.
Analysis of Mesolithic Swedish forager StoraFörvar11
More ancient genomes from Sweden: Pitted Ware forager Ajvide58 and TRB farm girl Gokhem2
I'm still trying to answer a whole lot of e-mails so I won't be monitoring this post for a while. But please feel free to share your results and any tips you might have in the comments below.
Saturday, December 28, 2013
This is a test that attempts to fit you to the three inferred prehistoric European populations as described in this recent preprint. The relevant Excel file can be downloaded here, and all you have to do is stick your Eurogenes K13 results into the fields provided to get the EEF-WHG-ANE ancestry proportions. A modified version for Near Eastern and Southeast European users can be accessed here.
The test is based on correlations between the average levels of the Eurogenes K13 and the ancient components among selected European populations (see here). Below is a brief description of each of the ancient components.
Early European Farmer (EEF): apparently this is a hybrid component, the result of mixture between "Basal Eurasians" and a WHG-like population possibly from the Balkans. It's based on a 7500 year old Linearbandkeramik (LBK) sample from Stuttgart, Germany, but today peaks at just over 80% among Sardinians.It's important to note that this test is only likely to be accurate for people of European ancestry, and indeed only those who aren't outliers from the main European clines of genetic diversity. For details of what that means, please consult the aforementioned paper. However, roughly speaking, if you're of European origin and don't score more than 3% East Asian, Siberian, Amerindian, South Asian, Oceanian, Northeast African and/or Sub-Saharan admixture, then you should get a coherent result. Users from the Near East and Caucasus should run the version specifically designed for them, while those from Southeastern Europe might find it useful to run both calculators and then compare the results.
West European Hunter-Gatherer (WHG): this ancestral component is based on an 8,000 year old forager from the Loschbour rock shelter in Luxembourg, who belonged to Y-chromosome haplogroup I2a1b. However, today the WHG component peaks among Estonians and Lithuanians, in the East Baltic region, at almost 50%.
Ancient North Eurasian (ANE): this is the twist in the tale, a component based on a 24,000 year old Upper Paleolithic forager from South Central Siberia, belonging to Y-DNA R*, and known as Mal'ta boy or MA-1. This component was very likely present in Southern Scandinavia since at least the Mesolithic, but only seems to have reached Western Europe after the Neolithic. At some point it also spread into the Americas. In Europe today it peaks among Estonians at just over 18%, and, intriguingly, reaches a similar level among Scots. However, numbers weren't given in the paper for Finns, Russians and Mordovians, who, according to one of the maps, also carry very high ANE, but their results are confounded by more recent Siberian (ENA) admixture.
Thanks to project member DESUK1 for putting this together at such short notice, and MfA for the modified version. Please post your results in the comments section below and state your ancestry when you do. This will help us to improve the accuracy of the test. My results make perfect sense, considering my Polish ancestry, relative to those of the reference samples (see http://img24.imageshack.us/img24/2240/q1is.png">here).
Below that is a PCA courtesy of project member PL16, based on the EEF-WHG-ANE test results for selected populations. The positions of the ancestral EEF, WHG and ANE groups reflect the PCA loadings (see here).
This is my interpretation of who these components represent. Of course, this model might change when more ancient genomes are analyzed.
WHG and WHG/ANE: indigenous European hunter-gatherers
EEF: mixed European/Near Eastern Neolithic farmers
ANE/WHG: Proto-Indo-European invaders from the Eastern European steppe
ENA/ANE: early Uralics from the Volga-Ural region
EEF/WHG/ANE: late Indo-Europeans (ie. Celts, Germanics and Slavs)
Iosif Lazaridis, Nick Patterson, Alissa Mittnik, et al., Ancient human genomes suggest three ancestral populations for present-day Europeans, bioRxiv, Posted December 23, 2013, doi: 10.1101/001552
Ancient human genomes suggest (more than) three ancestral populations for present-day Europeans
Ancient North Eurasian (ANE) levels across Asia
Thursday, November 21, 2013
The old Eurogenes K13 has been replaced by a new model with different, and hopefully more robust, ancestral clusters. The new version also includes Oracles as well as 2D and 3D Principal Component Analyses (PCA). The K13 population averages and genetic (Fst) distances between the inferred ancestral clusters are available here and here, respectively.
GEDmatch > Ad-Mix Utilities > Eurogenes > K13 Below is a 2D PCA based on the average K13 results of the European and Asian reference populations, courtesy of project member PL16.
Thus, Eurogenes now has four tests at GEDmatch with Oracles: the Jtest, EUtest, EUtest V2 and the K13. It's useful to keep in mind that these tests will differ in their interpretation of the data, and perhaps accuracy, depending on the ancestry of the user. For instance, the new K13 should be more useful for Central and South Asians than any of the others, because it features new reference samples relevant to them.
Monday, October 7, 2013
This new test is essentially an upgraded version of the EUtest. Unlike the original, it includes an Amerindian component and five native reference populations from North and Central America. So obviously it should be a lot more useful for users from the New World who are wondering about Amerindian admixture. GEDmatch > Ad-Mix Utilities > Eurogenes > Eurogenes EUtestV2 K15 I just tried it myself, and have say that the 4-Ancestors Oracle results were impressive. In other words, they were very accurate based on what I know about my recent ancestry. On the other hand, I'd say the default Oracle was picking up more ancient gene flows. However, this might not be the case for everyone, so let's hear some feedback, discuss the outcomes, and perhaps tweak the settings if necessary. One of the most important things to keep in mind is to ignore all results under 1%. These are likely to be noise. Here are the populations averages and Fst distances between the ancestral components. Below are spatial maps of the main West Eurasian components courtesy of Gui (FR7): Baltic, North Sea, Atlantic, East Euro, West Med, East Med, West Asian.