๐Ÿ“ฆ Monty / DNA_Match_Manager_Helper

๐Ÿ“„ README.md ยท 116 lines
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116## DNA Match Manager Helper

[DNA Match Manager]:https://heirloomsoftware.com/dna-match-manager/
[DNA Match Manager Helper]:https://github.com/Monty/DNA_Match_Manager_Helper
[cvskit]:http://csvkit.rtfd.org
[columns]:DNA_Match_Manager-FieldNames.md
[Windows Subsystem for Linux]:https://docs.microsoft.com/en-us/windows/wsl/faq

**[DNA Match Manager][]** is a free program that downloads DNA match data from
Ancestry, 23andMe, FamilyTreeDNA, GEDmatch, and MyHeritage (*in minutes instead
of days*). However, some important numeric data in its output files, such as
Shared cMs, is in a [different column for each site][columns]. That makes it
difficult to sort into a useful order.

The shell script **makeDNASpreadsheet.sh** overcomes that problem by extracting
and coalescing data from those files into a summary spreadsheet sorted by
Shared cMs, Shared %, and then Match Name so that the closest DNA relatives are
at the top.

The resulting spreadsheet contains the following columns:

| Column | Description |
|--------:|:-------------|
| Source Site | Site the data came from - *e.g. Ancestry, 23andMe, etc.* |
| Match Name | Name of the person who is a DNA match |
| Email | Email address of the Match (if provided) |
| Side | P, M, or blank - *i.e. P[aternal] or M[aternal]* |
| Gender | M, F, or blank |
| Shared % | Percentage of DNA shared - *computed if not provided* |
| Shared cMs | Total amount of DNA shared (*in cMs*) |
| Shared Segs | Total number of DNA segments shared |
| Longest Seg | Size (*in cMs*) of largest matching segment |
| Relationship | Varies - *e.g. GEDmatch uses a number instead of text* |


### Prerequisites

In addition to **[DNA Match Manager][]** you'll need a few standard tools such
as bash, awk, and grep which are installed by default on both macOS and Linux,
and are installed by  **[Windows Subsystem for Linux][]** on Windows 10.

### Compatibility

Tested on macOS Mojave and Catalina. Tested on Ubuntu 18.04. Should work in
Windows 10 if **[Windows Subsystem for Linux][]** is installed.

### Instructions

1. Clone or download **[DNA Match Manager Helper][]**.

2. Run **[DNA Match Manager][]** to collect data from any sites you use. Save
(*in the directory created in step 1*) either a file per site or a single
file containing all sites. Include **one** of these key strings in the name you
choose for each output file. Be consistent in your methodology!

    * 23andMe
    * Ancestry
    * FTDNA
    * GEDmatch
    * MyHeritage
    * AllSites - *Only if you choose to use a single file to contain all data*

    For example, instead of the default
**MatchManagerExport_1.13.2020.2.48.28PM.csv** save your 23andMe match data
as **Match-23andMe_1.13.2020.2.48.28PM.csv** (*or any other name containing
"23andMe"*).

3. In a terminal window, type: **`./makeDNASpreadsheet.sh`** to generate a
match summary spreadsheet and (*if there are any*) a list of matches added
since the previous run. They will have *timestamped* names such as
**Relatives-*2020-01-11.103106*.csv** and **Additions-*2020-01-11.103106*.csv**

    On macOS you can simply double click **`run.command`** in a Finder window
(*or right-click and select 'Open'*) which will automatically open a Terminal
window and run **`./makeDNASpreadsheet.sh`** for you.

    By default, only matches sharing more than 100 cMs are saved. This is
appropriate for adoptees or others seeking close DNA relatives. If you are
doing genealogy research a lower minimum will be more helpful.

    To change the minimum cMs saved, use the **-m** switch, e.g. in a terminal
window type: **`./makeDNASpreadsheet.sh -m 19.5`** ***Note:*** *If you want 20
cM, enter 19.5 as some web sites show cMs rounded up to the next whole number.*

### ProTips

Create a permanent master spreadsheet from the **Relatives-*timestamp*.csv** file
and save it as a non-csv spreadsheet. Use that spreadsheet to make notes, add
missing data, and change erroneous relationships (*e.g. second cousin to first
cousin once removed, etc.*). Add as many columns as you need for things like
most recent common ancestor, birth year, location, common surnames, or data
from [site specific columns][columns].

Don't delete the **Relatives-*timestamp*.csv** file produced by this script. It
will be used by later runs to produce an **Additions-*timestamp*.csv** file
containing data that can be cut/pasted into your master spreadsheet. You can
either paste that data one line at a time into its proper slot, or paste it all
at the bottom and then sort it by Shared cMs (*descending*), Shared %
(*descending*), and then Match Name (*ascending*) to move it into the correct
slot.

There is no reason to run **`./makeDNASpreadsheet.sh`** again until you know or
suspect you have a new relative that shares more DNA than the minimum cMs
saved. Run it monthly if you don't keep a close watch on your DNA testing
sites.

If you want to see what added matches will look like without waiting for new
ones to show up on the web, just reduce the minimum cMs slightly, e.g.
**`./makeDNASpreadsheet.sh -m 99`**

If you want to permanently change the minimum cMs saved, edit
**makeDNASpreadsheet.sh** and change the line **`cMs_min=99.5`** to a
different value.

Enjoy!