Wasting my time...

One of the most irksome aspects of working in computational biology is how frustrating it can be to analyze other people's data (OPD) [1]. By OPD, I don't mean quickie files generated for personal use; rather, I'm talking about datasets ostensibly provided so that other folks can build upon, or at the very least, replicate published work. I'm talking about anything from supplementary material included with papers and/or software, to big taxpayer funded public databases.

Here's a typical scenario: I need to combine two or more pieces of data, such as a list of human disease associated variants identified in a study with some database of previously published variant associations. Conveniently, both datasets use the same format for identifying variants, which means that this should boil down to finding the union between a particular column in each of the tables. This shouldn't take more than five minutes, right?

Unfortunately, I quickly notice that some proportion of variants aren't being found in the database, even though the referenced origin of said variants are in there. 15 minutes of searching reveals that many of these are just typos, the others I'll have to check in more detail. I decide that I'd better write a script that cross-references the references [2] against the variants to catch any further mistakes, but this ends up spitting out a lot of garbage. Some time later, I realize that one of the tables doesn't stick to a consistent referencing style [3], so I can either go through the column and fix the entries manually, or try to write a script that handles all possibilities. A few hours later, I've finally got the association working, minus a dozen or so oddball cases that I'll have to go through one-by-one, only to find out that much of the numeric data I wanted to extract in the first place is coded as 'free text'. Now I'll need to write more code to extract the values I want. However, it's now 7 pm, and this will have to wait until tomorrow.

I've encountered this sort of problem many, many times when working with scientific data. Why are we so tolerant of poorly formated, error-riden, undocumented datasets? Or, perhaps a more appropriate question is why don't scientists have more respect for each other's time? Is it more reasonable for the dataset generator to spend a little bit of time checking the reliability of their data programmatically or for each person who downloads the data to waste hours (or days) working through typos and errors?

I get it: after spending months writing and rewriting a manuscript, rarely do you feel like spending a lot of time polishing off the supplementary materials. Mistakes happen simply because you're in a rush to get a draft out the door. On the more cynical side, I have also been told that spending time making it easier for people to use my data isn't worth my time. Neither of these considerations explains errors found in public databases, however.

I don't have a solution to the problem, but I'm pretty sure that the root cause is one of incentives: that is to say, there are few professional incentives for making it easier for your colleagues (competitors) to replicate and/or build upon your work. Perhaps we need a culture shift towards teaching better values to students or, more realistically, we need journals to actually required that data follow minimal standards, perhaps including requiring that mistakes in supplementary tables be fixed when pointed out by downstream users. 


[1] Who's down with OPD? Very few folks, I'm afraid.

[2] Cross-referencing has always struck me as the lamest, overused, 'nerd word' on TV. I cross-reference all the time, but I think this is the first time I've actually referred to it as such.

[3] e.g., [First author's first name] YYYY. I wish I was making this up.

Crossing the Rubicon...

The past six months have been pretty tumultuous: I worked hard to get a number of interesting projects going, found out that I was soon going to become a dad, and decided to leave academia. I'm a bit ashamed to admit it, but it's the last one that's given me the most angst.

I know that I'm far from unique in never having seriously considered alternatives to applying for professorships: from the moment you begin grad school, you're socialized to believe that being a professor at a research university is the true measure of success in the basic sciences. I suppose that I had such a great time as a student that I never really questioned this. However, my postdocing experiences, coupled to the general negativity surrounding the state and future of the academic job market, have changed my feelings.

See, as a grad student I frequently observed postdocs scrambling to get things done as quickly as possible. This involved a lot of 'doing' without much thinking about how to do it well. It seemed obvious to me that taking the time to read the literature, writing detailed and commented scripts, and trying multiple approaches paid off in the end. 

Once I became a postdoc, I realized that I was falling into the exact same traps. With multiple projects going, time became extremely precious. Putting effort into improving my skills with this-or-that analysis language/software/statistical method was time that could be spent doing more analyses. And I always had multiple datasets sitting on my desk. Furthermore, keeping up with the literature in my fast-moving field became challenging, meaning that reading about things outside of the direct scope of my many projects was out of the question. Finally, I began skipping more and more seminars in order to have more time to get through the endless pile of work. 

Sure, I could have put in more time than the six days a week that I was averaging. Similarly, I could've read more papers at home. But I also realized pretty quickly that having 'hobbies' and spending time with my significant other were also important to me.

Regardless, all of this work paid off: I got three great papers in three years, and set the foundations for more down the road. It was time to 'strike while the iron was hot' and apply for jobs. So I began putting together faculty applications, writing research and teaching statements, and looking into available positions. But as I started contemplating what I'd be doing for the next five years, I began to think more and more about the pros and cons of the academic life.  

If you take a step back and look at careers in academia, it's difficult not to notice that things appear to have been getting worse for years. Competition for a very limited number of positions is now extremely fierce, and most aspiring academics don't get to choose where they'd like to work. Rather, they take whatever is available. Even after getting a job, securing funding has also become more competitive than ever. Most assistant professors that I know work an unbelievable number of hours, writing endless grants in the hope of securing a coveted 'R01' as soon as possible.  

I'm hardly the first person whose been in the situation of beginning a family in the midst of big career changes, but I realize that I never want to be in the position where I feel torn between having to work vs. spending time with my son. Sure, I could look for a position at a less-competitive institution, but then I'd likely have to move to a place where I wouldn't want to live and settle for a salary that's insulting given the number of years that I've put into 'training'.

Then there's the so-called 'two-body problem'. My girlfriend is also a postdoc, and she'd like to pursue a career in the biotech industry. Relatively few places in the country have many job opportunities for someone of her qualifications (or mine, for that matter). Luckily, the San Francisco Bay Area is one of them and we both really like it here. Yes, the cost of living is astronomical, but there are many great amenities, including the fabulous weather. 

So, after a long, hard bout of self-reflection, I realized that while I love doing research, the grim reality of academia just isn't exciting to me anymore [1]. In fact, I realize that the only reason I've been so dead set on pursuing this track is because so many of my friends, colleagues, and mentors keep telling me that I should. With this in mind, I decided to explore alternatives and researched a number of industry positions. 

It's funny that I say 'alternatives' since the majority of PhDs do not go into academia. Nevertheless, I, like others, had no idea what to expect [2]. After sending out applications and speaking to a few hiring managers, I accepted a job with a company that's doing exciting stuff [3]. I'm going to be breaking totally new ground, both for myself and my employer, and while it's a bit daunting, I'm pretty excited. Actually, I'm excited about staying in a place that I enjoy as I've now spent a decade-and-a-half feeling like everything around me is temporary. Plus, I'll be making enough money to enjoy it - this is important as there's a baby on the way.

With all of this said, I was thinking of converting my previously 'semi0professional' website into more of a blog, and chronicling my experiences leaving the ivory tower and starting a family. I miss the days of blogging regularly as I did in grad school, and with any luck, I'll now have more time to do it! Stay tuned for more as it develops.   


[1] I wonder if most academics realize how negative the atmosphere in the field can be. It's become the norm now to post grim and dire statistics about the state of funding/hiring/believability-of-results on social media on a daily basis. Given how much academics complain about their jobs, it's amazing that competition for said jobs is so fierce. But then again, maybe I'm just one of those 'disgruntledocs' who's 'getting what I deserve', whatever that means. 

[2] I've come to the conclusion that universities, and especially grad-schools, are overly focused on producing future academics. I doubt that I'm an outlier in saying that I received very little advice about what companies are looking for, or even 'do' for that matter. This could be the subject of a future post.

[3] Obviously I won't be able to talk about my work, but I'll talk about the company when I get there.