Monday, December 3, 2018

Auto-Clustering of DNA Matches

In the Facebook genetic DNA groups, I've seen one subject discussed pretty frequently over the past couple of weeks that I hadn't heard of before. That is auto-clustering of DNA matches. This looks like a spreadsheet of your DNA matches, color coded and clustered into common match groups. The way it works is that for any given colored cluster, each of them should be descended from one common ancestral couple. This seems like a way to better compare in-common matches. Here is how you do it.


First, go to https://www.geneticaffairs.com and create an account. When you open an account, you are given 200 credits, which you use for various updates and analysis. You can purchase additional credits if you wish. If you're an amateur genealogists researching only your own family, I expect these credits to last a pretty long time.

Once your account is created, create your first profile. Go to "Websites & Profiles" and click "Add Website".


Select the DNA testing company where this profile is located, then enter the username and password. As an information security professional, I recommend setting the password to one you are comfortable with sharing, then change the password once you're done using Genetic Affairs. Do this for any companies you choose to give logon information to. This way you won't have to figure out who all you gave your password to. Once you change it, they'll no longer have access.

Once the profile is created, go to "Websites & Profiles" and click "Manage All Profiles". From here, you can generate a spreadsheet of your matches to be emailed to you.
While this can be handy, you can also do this from the company you tested with. I know you can do this with Family Tree DNA so you may not need to spend your Genetic Affairs credits. Also, when you generate your auto-cluster, they send you one of these also, so there is really no need to do this. For this reason, I recommend you change the "Update Interval" setting to "Never".

Then, the most useful tool I've seen in a long time, the auto-cluster. On your profiles screen, select circular arrows icon under "AutoCluster".


I first tried using Approach A. This did not help me at all. It showed a very large number of two person clusters. This is not going to help me line up clusters with common ancestors. I next tried Approach B starting with "2nd Cousin - 4th Cousin" to "3rd - 5th Cousin". While first cousin matches will help you figure out the grandparent couple a given cluster would be related on, this tool works best looking for great-grandparents and further back. Most of the time, finding out which grandparent a cluster is from won't help you with your genealogy. Once you get beyond 5th cousins, you begin losing DNA that was passed down from your 4x-great-grandparents. While it can help you on the more distant lines, I doubt you'd want to start with that level.

Here's what my auto-cluster looks like using the settings I just spelled out:


The names along the top are the same names as are along the left side. That's why you see a dark line running diagonally through the chart. The dark colors show they match themselves. Each of the colored squares are clusters of individuals that match me and match each other. White spots within each square show pairs of individuals in those clusters that don't match each other.

When you get right down to it, each of the squares are the same thing you'd see in a Family Tree DNA match matrix. What makes it so powerful is that you don't have to select them yourself. It's fully automated and you can see them all at once. With the matrix tool, you're limited on how many people you can add. With auto-clustering, there is no limit. I've seen charts for very endogomous populations where the entire chart was one big cluster of inter-related individuals.

I knew had a large number of in-common matches with many of my 2nd-4th cousins. I didn't realize just how many or how many different clusters they were in. Looking at the above graph, I don't know a common ancestor for any of the groups until I get down to Cluster 4, not shown in this graph because the first three take up so much space.

While I have yet to fully dive in to exploring my auto-cluster chart, I believe this will help me make sense of my in-common matches and hopefully track down some common ancestors and likely their origins in Germany and Ireland.

I encourage you to give the auto-clustering tool offered by Genetic Affairs a try. All the cool kids are doing it! But really, this is the first new tool I've seen in a long time that looks like it could give me a good amount of help in sorting out my genetic genealogy.

--Matt

1 comment:

  1. I understand that Genetic Affairs has temporarily suspended auto-clustering. It sounds like the tool is having problems accessing the data. I know it had problems with Ancestry beginning yesterday and it sounds like there may be more problems. Since it's a new tool and is having problems, suspending it temporarily while they sort it out is probably the best move. Here's to hoping they get it sorted out quickly. This is the first new powerful tool for genetic DNA research that has come around in quite a while. Whenever it comes back online, please be sure to give it a try. I honestly believe that using this, I'll be able to make some headway on some brick walls.

    ReplyDelete