Thursday, 19 May 2011

Well that KEGGing sucks - but how much?

KEGG (Kyoto Encyclopedia of Genes and Genomes) is a hugely important database of genomic pathways and interactions that has been used daily by countless molecular biologists over the past 15 years (up to 200K unique web site visitors per month).

Even though the data sources that KEGG integrates to build its database are predominantly available to all, free of restriction, the KEGG database itself has traditionally carried a dual license - free for academic use, but non-free for commercial use through their Pathway Solutions licensing agent. I'm no great lover of dual licenses as they discourage commercial use thereby restricting translational application of the resource 'for the good of humanity'. Well, two days ago KEGG announced that it would go even further, by charging up to $5000 for academics to download the database (starting July 1st).

Can we use this unfortunate circumstance to assess the impact of limiting access to an established resource such as KEGG? And do it using a scientific measure that really matters; citations? Given that KEGG have 1000 citations/year and a 15 year trading record, the returns for the next few years should be very revealing.

Footnote - funding for large integrated databases is notoriously difficult to maintain over the long term even though the resources themselves are enormously valuable. In February NCBI tackled budget challenges by throwing their SRA toys out of the pram (on which I have commented before), whereas KEGG have been far more pragmatic in looking for alternative sources. I have huge respect for both projects.

Wednesday, 18 May 2011

Financial value of open vs closed science

I was visiting the University of East Anglia today, presenting to their Enterprise and Engagement Club. My rather experimental talk was titled "Extracting value from open science; a commercial perspective" (slides to be posed soon). I wanted to demonstrate my point using an example along the lines of; "project A with open science generated $big, whereas with closed science it would have only generated $small". Whilst not quite exactly that, I did find an interesting financial comparison from the human genome sequencing project;

  • Exhibit A, the perceived cost to the biotech industry of "opening" the human genome; 
"In March 2000, President Clinton announced that the genome sequence could not be patented, and should be made freely available to all researchers. The statement sent Celera's stock plummeting and dragged down the biotechnology-heavy Nasdaq. The biotechnology sector lost about $50 billion in market capitalization in two days" [from Wikipedia].
  • Exhibit B, the total economic impact of the "open" human genome according to the recent Life-sponsored Battelle report;
    "The Human Genome Project [...] wasn't just a money-sucking vanity initiative that only reaped profits for personal genetic testing companies like 23andMe. The project has, in fact, driven $796 billion in economic impact and generated $244 billion in total personal income" [from Fast Company].

    So, it looks like open-science for the human genome project is about $200 billion (personal value) in the black! That's a fair chunk of change by any estimation.

    Disclaimer - I'm not an economist, and do not attempt to justify the validity of the comparison (other than both numbers having the same units) in any way.

    Tuesday, 10 May 2011

    Open science; Open for business?

    OK, let's get started.

    What is open science? Well, my personal take is this - the scientific process has four outputs;
    1. Methods/protocols that,
    2. Generate data that,
    3. Can be documented/published thereby,
    4. Contributing new insight/knowledge to the scientific corpus.
    I define 'open science' as any of the above that is made for free for the use of anyone with few if any restrictions. For example,
    1. Methods => Open source
    2. Data => Open data
    3. Documentation => Open access
    4. Knowledge => Open innovation (although formally open innovation can include patented/licensed ideas)
    In the context of publicly-funded science I assert that the 'value' created by open scientific output generally exceeds that of its closed counterparts, although I'll leave the argument itself for a later post. And a truism - this value will never be realised unless it is actively extracted, in the same way a field crop withers and dies unless harvested. 

    Unfortunately, it is the very process of extracting value from open science, the business of open science, the trade in open science, that is widely ignored by the scientific community. Indeed, the assumption that peer reviewed publication is the only scientific currency was the premise of this Guardian piece - 'Why won't Open Science work?'. 

    So - do we need a new way of dealing in the existing currency, epitomised by open access publishing? Or do we need a new currency, nano-publications perhaps? Whichever, (and I'm desperately clutching for an appropriate idiom), where there's value, human nature will eek it out, come what may.