Pregnancy Testing and Me

In the dim recesses of my memory is a breakfast meeting overlooking the Charles River at the Howard Johnson’s Hotel on Memorial Drive attended by Tom Benjamin, David Livingston, Irving Berstein and me. Irving was a serial entrepreneur, who I’d met through a Wash. U. friend who was working as an assistant to him, during his day job, running the Health Sciences and Technology Program at MIT.

I had just finished my PhD, was freshly back from a job interview at Genentech, and was planning a post-doc at Harvard. But, my funding had been cut because of Ronald Reagan’s Presidency and Federal Budget Freeze. It was Reagan’s first budget during the Winter of 1981-1982 and biotech (which was hardly a word, then), was feeling the first effects of Presidential disdain for Research.

Irving had this idea: There could be a better pregnancy test than the one already on the market. This one with antibodies coupled to a color change, that would make it easy to tell when the test was positive. Tom Benjamin and David Livingston were there to assure
Irving that it was possible. I was there to (hopefully) assist in the writing of a business plan. I was out of a job, had successful grant writing experience, knew the field, and could no doubt make a contribution.

a magazine ad for the test
Here’s a magazine ad for Irving’s “me too” test from the 1980s

Except, I needed to make a living.

One thing I did not then understand about entrepreneurs was how penurious they were.

The content of the meeting was more about meeting business plan objectives within a given time frame and the size of the effort required. To my knowledge, formal design controls weren’t invented until 15 years later, so we had no understanding of those. But, there was a technique, pioneered at Harvard Business School, within sight of our breakfast meeting, for writing detailed plans for a business, complete with projected cash flows, and I’d read the popularized versions for doing this, mainly authored by Joe Mancuso, a Worcester Polytech. graduate who’d written widely sold books on the subject. I hadn’t yet met Joe.

After the meeting, I went back to my one bedroom apartment near Davis Square in Somerville; an apartment jammed with boxes sent by my then girlfriend who lived in Austin, who was about to make the trek to Boston, to live with me. Therein was the greatest cause for my non-acceptance of the Genentech post-doc. One of my close friends was at Genentech in South San Francisco, but I hadn’t seen him during the job interview (he was a protein person; I was nearly all nucleic acids all the time). And, I knew that I might be making a mistake by not going west; but the time wasn’t right for me personally. And, so I intended to do as best I could for myself at Harvard.

Irving called. I had started on the business plan but he wanted to discuss salary. Starting for a post-doc in those days, after years of training was $ 13,300. He wanted to pay me $ 12,000. I was flabbergast. It was difficult enough to have gone from a $9600 a year salary as a technician in St. Louis to a $ 3000 stipend for the past four years at Tufts; but now, to be cut down to only a bit more than I’d made as a technician five years earlier was too much. I argued for more, but Irving wouldn’t budge. Ultimately, he said that he would give me $12,000 for 1982 and $14,000 for 1983, and wouldn’t that be the same? I couldn’t believe my ears. I turned him down.

Eventually, Bill Haseltine offered me a post-doctoral position at the Sidney Farber Cancer Institute and within a few months, I was working just downstairs from David Livingston.

But, every once in a while, I checked in with Irving, after he’d moved to a lab and office in Waltham, to keep his new venture going. He told me the research was more difficult than he’d anticipated and that it was taking longer.
I even visited once, when I was looking for a new position in late 1984. By then, he had re-formatted the test and was ready to submit it to the FDA as substantially similar to the tests already on the market, using the mantra that it could determine pregnancy “six days earlier” and its competitors.

In the end, Irving had ended up with a “me too” product. Not a great innovation, but a reasonably well thought-out new home test. Large chains picked it up, commercials were made, and Irving had made another great hit, before his next incarnation at Texet. Yes, I introduced him to that company where he was CEO for several years. But I’ll leave that story for another day.

The FDA Precision Pathogen Detection Challenge

Sometime during January this year, I was surfing the internet and came upon a microbial identification challenge that was sponsored by the US FDA. The challenge was about to get underway, and reminded me of a project I’d started with the Army’s Biotechnology High Performance Computing Software Applications Institute (BHSAI) and the National Cancer Institute (NCI) in Frederick, Maryland. The aim of the project I’d started in 2015 was to rapidly identify battlefield and hospital pathogens, with the hope of using a small mobile DNA sequencer (like the Oxford Nanopore MinION) and a cloud based system for taxonomic identification. The unique twist that I brought to the project was experience with an exacting local best match algorithm, named Smith-Waterman, to use for the identification purpose.

But by February 2018, I was slammed with work, starting up a new genetic test and never thought I’d have time to compete in the challenge. Yet, as luck would have it, about the middle of the month, I was out of a job!

After a few weeks of R&R and recovery time, I set about to enroll in and compete the the challenge. I revived my old code, tried working the examples on my laptop MacBook Pro (the only computer conveniently at my disposal) and started running the 4 core machine continuously, overnight. My 2012 vintage machine was the top of the line when it was purchased and I’d put in a 2 Tb hard drive late in February. It was time to teach the Fast Artificial Neural Net (FANN) to adjust the quality scores for the FDA sequences of interest. Time to focus, then, on being able to identify Salmonella enterica var. enterica serovar Newport and divine it from all other strains!

Motivation for the detection software was the nosocomial bacterial infection with Carbapenem resistant Klebsiella pneumoniae (kpc) that had killed 11 patients at the NIH clinical center in 2011. That, combined with the report by my friend Chris Mason that Bacillus anthracis could be found in the NYC subway provided sufficient impetus for alignments and writing scripts.

In 2013-4, it was still early days for Oxford Nanopore sequencing, but I knew that the error profile would improve with time. It seemed foolish not to begin work.

And, so I labored over the program, hoping that, at some point, I would be able to publish the work, with a small cadre of colleagues. Here is the precis that I included with the FDA challenge submission. I hope to provide more details a month or two from now:

Each sample was aligned to the 14 whole genome Newport strains listed at the MBGD database (Uchiyama et al., 2015) using MosaikAligner 2.2.3 (Lee et al, 2014) after compressing using MosaikBuild. The aligner uses a striped Smith-Waterman algorithm implementation to search for local homology and a Fast Artificial Neural Net library to refine the sequence search. The parameters used during the alignment included aligning all reads to all positions with a maximum mismatch percent threshold of 0.1, a minimum percent alignment threshold of 0.5, and a hash size of 15 with a hash position threshold of 100 bp. A perl script was used to filter all such alignments with a samtools mapq => 35 with a perfect match of 150 bp or greater. Computing was entirely done on a mid-2012 vintage MacBook Pro with a 2.9 GHz Intel core i7 microprocessor and 8 GB 1600 MHz DDR3 RAM. The serovar was identified through comparative counting of qualified alignments. Additional work involved alignment to the 408 Salmonella fasta files listed in the MBGD database. This final step was not completed in time for submission to the challenge. A manuscript is in preparation describing the methods used in more detail.

References:

Lee WP, Stromberg MP, Ward A, Stewart C, Garrison EP, Marth GT. MOSAIK: a
hash-based algorithm for accurate next-generation sequencing short-read mapping. PLoS One. 2014 Mar 5;9(3):e90581. doi: 10.1371/journal.pone.0090581. eCollection 2014. PubMed PMID: 24599324; PubMed Central PMCID: PMC3944147.

Uchiyama I, Mihara M, Nishide H, Chiba H. MBGD update 2015: microbial genome database for flexible ortholog analysis utilizing a diverse set of genomic data. Nucleic Acids Res. 2015 Jan;43(Database issue):D270-6. doi: 10.1093/nar/gku1152. Epub 2014 Nov 14. PubMed PMID: 25398900; PubMed Central PMCID: PMC4383954.

Acknowledgement:
The author wishes to thank the FDA for motivating participation in the challenge and John Plaschke, Stefan Stefanov of NCBI, Kate Im, Brian Bushnell and Chris Mason for supporting earlier versions of this work.