Math Genealogy Project
I traced my mathematical lineage back into the XIV century at The Mathematics Genealogy Project. Imagine my surprise when I discovered that a big branch in the tree of my scientific ancestors is composed not by mathematicians, but by big names in the fields of Physics, Chemistry, Physiology and even Anatomy.
There is some “blue blood” in my family: Garrett Birkhoff, William Burnside (both algebrists). Archibald Hill, who shared the 1922 Nobel Prize in Medicine for his elucidation of the production of mechanical work in muscles. He is regarded, along with Hermann Helmholtz, as one of the founders of Biophysics.
Thomas Huxley (a.k.a. “Darwin’s Bulldog”, biologist and paleontologist) participated in that famous debate in 1860 with the Lord Bishop of Oxford, Samuel Wilberforce. This was a key moment in the wider acceptance of Charles Darwin’s Theory of Evolution.
My namesake Franciscus Sylvius, another professor in Medicine, discovered the cleft in the brain now known as Sylvius’ fissure (circa 1637). One of his advisors, Jan Baptist van Helmont, is the founder of Pneumatic Chemistry and disciple of Paracelsus, the father of Toxicology (for some reason, the Mathematics Genealogy Project does not list any of these two in my lineage—I wonder why).
There are other big names among the branches of my scientific genealogy tree, but I will postpone this discovery towards the end of the post, for a nice punch-line.
Posters with your genealogy are available for purchase from the pages of the Mathematics Genealogy Project, but they are not very flexible neither in terms of layout nor design in general. A great option is, of course, doing it yourself. With the aid of python, GraphViz and a the sage library networkx, this becomes a straightforward task. Let me show you a naïve way to accomplish it:
Let us start by searching for a name in the database online. Once in screen, note the string of numbers at the end of the url obtained:
Every individual in the database has a unique ID in this fashion. Note also, in the source of the page, the field Advisor: If the advisor of an individual is not unknown, the page will link to the corresponding page. We can retrieve the ID of the advisor easily from the source code. Even better, we can code a small script in python to recursively go upwards in the database gathering the ID’s of your ancestors in a dictionary. One fast way to accomplish this could be as follows:
import urllib def augment_genealogy(subject_id,subject_tree): # This function assumes that subject_id in not in subject_tree f=urllib.urlopen("http://genealogy.math.ndsu.nodak.edu/id.php?id="+subject_id) subject=f.read() f.close() if not subject.count("Advisor: Unknown"): # How many advisors did subject have? # For each advisor, retrieve their information, and attach # them to subject as parents advisor_list= for advisor in range(subject.count("Advisor")): subject=subject.partition("Advisor").partition("id=") advisor_id = subject[0:subject.index("\"")] advisor_list.append(advisor_id) if advisor_id not in subject_tree: augment_genealogy(advisor_id,subject_tree) subject_tree[subject_id]=advisor_list return subject_tree else: subject_tree[subject_id]= return subject_tree
But what good is a genealogy tree if we cannot see the names of the ancestors? The simple script below takes care of retrieving the names online for each of the IDs in our previously created dictionary. Of course, one could instead include the appropriate string manipulation in the retrieval script above, and kill two birds with one stone.
# Create a dictionary that maps to each id, its name names=dict() for id in data: f=urllib.urlopen("http://genealogy.math.ndsu.nodak.edu/id.php?id="+id) s=f.read() f.close() name=s.partition("The Mathematics Genealogy Project - ") names[id]=unicode(name[0:name.index("&amp;lt;")].strip(), "ascii", "ignore").encode()
These are the names in my tree, for example: Do you recognize any of them?
Let us move to sage and plot a graph of what we have obtained. Let us try with my ID ("113998"). We have different graphing options with networkx:
import networkx import matplotlib.pyplot data = augment_genealogy("113998",dict()) Tree=networkx.MultiDiGraph(data) matplotlib.pyplot.figure(figsize=(50,50)) pos=networkx.spring_layout(Tree,iterations=900) networkx.draw(Tree,pos,node_size=0,alpha=0.4,font_size=24) matplotlib.pyplot.savefig('/Users/blanco/Desktop/tree.png')
matplotlib.pyplot.figure(figsize=(50,50)) pos=networkx.shell_layout(Tree) networkx.draw(Tree,pos,node_size=0,alpha=0.4,font_size=24) matplotlib.pyplot.savefig('/Users/blanco/Desktop/tree.png')
Because of the cooperative nature among scientists, it is not unusual to encounter several ancestors over different generations linked to the same advisor. This usually destroys the structure of binary tree that one would expect for this type of graph, making the plotting of the data quite a challenge.
An easy workaround is to install GraphViz, and use interaction to this software from the networkx libraries. The following script attaches to each label the corresponding name, and produces the desired genealogy tree:
T=DiGraph(data) T.graphviz_to_file_named('/Users/blanco/Desktop/tree.dot') # Here is where we change the labels from numbers to names # We need to do it this way to avoid the issue "different people with same name" fr = open("/Users/blanco/Desktop/tree.dot", "r") workingString=fr.read() fr.close() newString=str() while workingString.count("[label=\""): breakString=workingString.partition("[label=\"") newString+=breakString+breakString stepString=breakString newString+=names[stepString[0:stepString.index("\"")]]+"\"" workingString=stepString.partition("\"") newString+=workingString fw = open("/Users/blanco/Desktop/newtree.dot", "w") fw.write(newString) fw.close() os.system('open -a /Applications/GraphViz.app /Users/blanco/Desktop/newtree.dot')