Friday, February 5, 2016

#ParsimonyGate: The Perspective of a Reformed ‘Hardcore’ Cladist


If you are reading this article you have probably read the now infamous editorial in the journal Cladistics http://onlinelibrary.wiley.com/doi/10.1111/cla.12148/full . Although signed “The Editors,” it isn’t clear if this was approved by anyone besides the current Head Editor who is most certainly a “hardcore cladist” (someone who thinks parsimony is the most reasonable, if not only, tool for inferring the historical relationships of organisms through phylogenetics). I distinguish between “hardcore cladists” and just “cladists,” because I think I am a cladist and that most other systematic biologists are too. A cladist in my personal definition is anyone who is distinguishing between pleisiomorphic (“primitive” features shared with a designated outgroup) versus apomorphic (derived characters distinct from the outgroup condition). Under my broader definition, basically everyone doing morphological or molecular work to discover the relationships of organisms is a cladist, and it doesn't matter if you are using parsimony, likelihood, or Bayesian approaches. The only exception are folks that are using overall similarity (e.g., bats and birds are close relatives because they both have warm blood and wings) which doesn’t distinguish analogy (convergence of characters) versus homology (characters derived from common ancestry) because it doesn’t follow the Henningian, or cladistic principal, of distinguishing between pleisiomorphic versus apomorphic characters.
            This view of cladistics I outline above are basically the foundation of Willi Hennig’s 1966 book “Phylogenetic Systematics” that is the bible of the Willi Hennig Society (publisher of Cladistics) and the foundation of modern systematic theory. At the time the idea of distinguishing between pleisiomorphy versus apomorphy was radical. Famed evolutionary biologist Ernst Mayr was the first to call those following Hennig’s principles “cladists” - as a pejorative by the way. Mayr preferred doing “evolutionary taxonomy” - basically where the expert on a group makes a hypothesis about the relationships of organisms based on characters they think are most important for supporting those relationships (e.g., owls, eagles and hawks are all each other’s closest relatives because these “raptors” all kill with their feet). The other alternative method in systematics in those early days were the numerical pheneticists that used overall similarity to group organisms as I explain above. (Read more about this interesting time in history in David Hull’s, “Science as a Process.”) The original cladists weren’t fighting for parsimony, they were fighting to only use derived characters in phylogenetics. Parsimony came around a little later with the work of several groups mainly from the University of Michigan and American Museum of Natural History. Parsimony was the only game in town to the early cladists, which was mainly for understanding the transition of morphological characters from primitive to derived. Then with the rise of molecular tools for obtaining DNA characters came new methods for inferring trees: model-based approaches including maximum likelihood and eventually Bayesian inference. Systematists of all sorts would meet at the annual Systematic Zoology/Biology meetings every year until the “hardcore cladists” decided to break away and have their own meeting, the meeting of the Willi Hennig Society founded in 1980.
            Now I should mention I trained as a systematist at both the University of Michigan and the American Museum of Natural History, the hot bed of cladistics and Hennig worship, albeit late in the game in the early 2000s. I was a hardcore cladist most of my early graduate career. I thought parsimony was the only reasonable way to infer relationships because it wasn’t a model-based approach like maximum likelihood or Bayesian inference. Those models made too many assumptions I thought and was taught. Alternatively, parsimony wasn’t a model because the foundation of that idea is to “minimize ad hoc assumptions about homoplasy” (i.e., reduce noise in the tree from characters moving around). Using parsimony, the shortest tree with the fewest steps (or evolutionary transitions) is the best tree – period. The other methods were using models to guesstimate from DNA sequences too much about how often an A (adenine) turns to a C (cytosine) or a T (thymine) to a G (guanine). It was crazy how much assuming those crazy-assuming people were doing. If they just did some morphology they would better understand how all this stuff really worked and that there is only one true religion, I mean method, parsimony. We were the Jedi knights that stuck to our principles; those other folks just weren’t thinking it through. Then something happened: I saw the light.
            I realized at one point that there isn’t a right way to study historical relationships. We can’t actually know the truth about who is related to whom when discussing organisms that diverged millions of years ago. We are also using methods that are extremely computationally intensive. They are all models, even the heuristic we use to run parsimony. No computer on Earth can fully resolve a phylogeny of more than a dozen or so species using any heuristic of parsimony or likelihood: there are just too many possible answers. When we study a historical science using morphological characters or DNA we will never be sure we are right. As I started using DNA methods more I realized I wanted to start better understanding when these lineages started to diverge. I needed to put a rough age on a group and to do that I needed to use likelihood and Bayes because only those use evolutionary models from which you can understand how DNA sequences change over time. I slowly found myself using these other methods more and more. Did I still use parsimony, sure sometimes, but it gave me the same answer as those other methods, just less information (e.g., a tree without branch lengths or information about time). The relationships themselves are interesting but I also wanted to know about evolution and biogeography beyond the tree.
            Now I’m still a cladist, and I hope I can count many friends among those in the Willi Hennig Society. (I named my dog Willi.) The folks in that society helped me think more clearly about methods and the philosophy of systematics, and also about the limits of what we can know in general (epistemology). They have invited me for talks at their annual Hennig meetings and I always learn a lot at this conference. Many people are intimidated by these meetings because many senior members do yell at each other, but they are friends in the end - trying to improve each others work. They do sometimes pick on folks that aren’t their friends, and that isn’t cool. You do have to bring your “A” game to Hennig because there are no concurrent sessions and there is an unlimited time for questions. You always have to explain why you picked a certain method over another, it isn’t about using the newest method it is about justifying your choice. (Much like the Cladistics editorial was trying to say. I think.) Compared to other meetings where there is often few, if any, questions - even after a terrible talk - I actually think Hennig is doing it right. They have many fewer members than other major systematic societies so they have the luxury of having just one session at a time and an open-ended question period. The Hennig conference is also strongly skewed male, which is a problem they really need to fix. Many senior members of the society need to tone it down a bit too. They can be crass and pedantic and use jargon as a weapon to make semantic arguments over relatively mundane things (“how can you test a model with a model”; “is there such a thing as an order-quantifiable metric of similarity”). I still publish in Cladistics (as recently as last year) and it even had a Bayesian analysis in it. Although I let my membership lapse a few years ago I’m not opposed to going to another meeting in the future. I think the editorial they published is a step backwards only because it sounds so uninviting: “If alternative methods give different results and the author prefers an unparsimonious topology, he or she is welcome to present that result, but should be prepared to defend it on philosophical grounds.” Many read that as, “You can submit non-parsimony things but you need to explain why, and even if you explain why, we still might not like it because parsimony.”
            I think the editorial was a mistake because it sounded like they will only accept the parsimony answer if you get alternatives from other sources. And that makes the journal “Hardcore Cladistics” and it was, at least recently, just “Cladistics.” I do hope they reconsider their stance, or at least clarify. I still consider Cladistics a great journal, one that I enjoy reading because of its organismal focus on systematics. I haven’t had issues with editors or reviewers telling me I need to do a parsimony analysis or remove a likelihood or Bayesian analysis, but I’ve heard that other may have. Time will tell if the journal and the society can right the ship, unfortunately, it was a storm of their own creation that has it teetering.

Tuesday, January 26, 2016

Learning “R” in Spain


Studying turtles with R. Julien Claude in the background.
The sun rising from Montserrat.
From January 17-23 my new PhD student A.J. Turner and I went to a small town near Barcelona in Cataluña, Spain. We were there to take a morphometrics course in R (more info here:Transmitting Science) . For the uninitiated, R is a programming language and environment that can used to manipulate data, conduct analyses, and make beautiful figures - among other things. We would like to use R to measure and compare shapes of various fish species to better understand how body shapes change over the life of an organism; how these shapes evolve among/between groups; and how to use information about shape to better understand the changing forms of of fishes over time. A.J. is quite clever and smart and he will one day be able to use this tool to make his cutting-edge dissertation even more cutting-edge. I was once clever and smart too but I’ve felt a little dumb post-tenure. I saw this class as an opportunity to retool, and to reshape (pun intended) some old projects and to think of new ones. It didn’t hurt that the course took place in beautiful Spain. The course was taught by Julien Claude, who is the author of a book “Morphometrics in R,” and he also wrote an R package called “ape” that has been cited thousands of times. There were about twenty other students from around the world there, some were studying shapes of dinosaur bones, or fruits, or flowers - among countless other projects. Almost all brought data to play with and manipulate. From 9am to 7pm for five days we were on our computers going through dozens of examples and exercises. It was rather intense, especially for me – not having been a student since I got my PhD almost 10 years ago. Except for some short breaks and meals, we were engrossed in R all day. The group of students and instructors were a disparate mix of international students, postdocs and PIs. Luckily everyone was very nice and A.J. and I ended up with twenty or so new friends and maybe some future co-authors. I particularly liked Julien. He and I share a rather silly and nerdy sense of humor. A running joke about one of the students being from the future and taking this class to destroy R like the Terminator had us giggling for days for some strange reason. (It might be that writing ten hours of computer code a day makes almost everything else hilarious.) Speaking of bad jokes: Do you know the favorite coding language of pirates? … R! ) By Day 4 my brain was full and I needed to take a bit of a break from the dark classroom and spend some time outdoors. A.J. and I got up at 5am and took a cab to the top of beautiful Montserrat and watched the sun rise over Cataluña. A.J. and I found ourselves walking around the grounds of a rather breathtaking Basilica at the top of Montserrat. There were monks chanting, bells ringing, and beautiful rows of multicolored candles lit for prayer. The sun rising over the mountains was stunningly beautiful as were the paintings and décor inside the monastery. The monastery has a famous dark skinned Virgin Mary statue that reminded us of this part of Spain’s rich African history. Cataluña houses an interesting mix of cultures, something that is notably distinct from the rest of Spain. We were often greeted with ‘Bom dia” in the morning (similar to the Portuguese “Bon dia”), and with “merci” in place of “gracias.” But alas we only had time to learn one new language, and we were back learning the grammar and culture of R in our classroom that same morning. In R we say hello like this setwd("/Users/Prosanta/Desktop/MorphometricswithR2016/datasets"). Our visit to Montserrat was just a few hours but luckily we also had a few free hours when we landed in Barcelona. My postdoc Fernando Alda is from Spain and I wouldn’t have been able to look him in the eyes if I didn’t tell him we saw at least some sites while we were in his home country. Luckily we were able to also see the Sagrada Familia on our way to the course on the day we landed. The Sagrada Familia is the infamously beautiful/hideous giant church designed by Spain’s most influential architect Antoni Gaudi. Inspired by biological shapes (apparently all biological shapes all at once), Gaudi initiated construction of this building in 1882 and it won’t be finished until 2020 (maybe). The building is impossible to describe with words, but let’s just say I don’t think Gaudi would have been good at R. Although I think our instructor Julien Claude can probably make anyone good at R.
               Julien is a patient and kind instructor who made sure every student was getting the current set of skills being taught before moving on; and he also understood that we each had different goals, projects, and kinds of data. For me learning elegant new tests of hypotheses for modularity (the independent changing of shape in one body part versus another) or fluctuating asymmetry (the unbalanced growth across a body’s axis of symmetry) were worth the price of admission. I already have new projects in mind and hope to help some students learn new morphometric techniques. A.J. and I are extremely grateful to our Department of Biological Sciences and Office of Research and Economic Development for the opportunity to attend the R class in morphometrics. 





A.J.Turner at La Sagrada Familia