Thursday, May 17, 2012

A scientific basis for Open Source Software

Stefan Steineger of the OpenJUMP project pointed out this great paper in Nature on The case for open compute programs.  The paper raises the argument for open source software to a higher plane, that of being a necessary component of scientific proof.  It points out that the increasing use of computational science as a basis for scientific discovery implies that open source must become a standard requirement for documentation.  Apparently some journals such as Science already require source code to be supplied along with submissions of articles.  Amongst other advantages, access to source code is an essential element of peer review.

An interesting example they mention is the infamous HadCRUT and CRUTEM3 meteorological datasets.  One of the (few) salient criticisms levelled at this information during Climategate was the inability to  reproduce the results by re-running the software. (Mind you, the software was probably a pile of crufty old Fortran programs mashed up by Perl scripts, so maybe it's just as well).

I'm looking forward to seeing JTS get cited in academic papers (actually, it already has been).  Maybe I even have a finite Erdos number!

It's maybe too much to ask that mere scientists be coding hipsters, but I noticed that SourceForge is presented as the leading example of collaborative software development.  Someone should introduce them to GitHub - which truly walks the talk.  Researchers in bioinformatics should be especially appreciative of the sweeping effect of recombinant software development it enables.






10 comments:

Jeffrey said...

This is the same exact reason why many mathematicians would rather use SAGE, an open source alternative to mathematical software like MATLAB and Maple. I don't see why this has to be a controversial topic; it's just common sense, especially for scientists. Software is not immune to error; in fact, it happens quite frequently.

Jeffrey said...

Great post, by the way!

Unknown said...

One of the (few) salient criticisms levelled at this information during Climategate was the inability to reproduce the results by re-running the software. (Mind you, the software was probably a pile of crufty old Fortran programs mashed up by Perl scripts, so maybe it's just as well).

Open source analysis software is a good start, but we should go further. Analysis is far too often a kludged together mashup (whether in Fortran, Excel, Python or anything else).

I would like to see some kind of web tool where you could put your data, as well as manage your source code. There would be a place to write wiki-like comments. If we had used something like that when I was a physicist, it would have prevented several cases where somebody had done some work, only to have it collectively forgotten before publication.

Then imagine it being scaled so there was a big public climate repository.

Dr JTS said...

Great idea. A sort of GitHub for scientists?

Wes McDermott said...

Some NASA climate code is available, see http://data.giss.nasa.gov/gistemp/ and http://www.giss.nasa.gov/tools/

Also, the Clear Climate Code project (clearclimatecode.org) is working on re-implementing climate code (starting with GISTEMP) with an emphasis on clarity.

mentaer said...
This comment has been removed by the author.
mentaer said...

Of course JTS has been cited frequently already, just do a search for "JTS topology suite" in scholar.google.com :)
I am sure many more used it in research. But it's also the problem of figuring what document to cite...

Dr JTS said...

@mentaer: Nice! Hadn't thought of doing that search.

Isn't it also an option to cite the project site directly? Of course, there's still a bit of confusion there. I'd like to see the Tsusiat site become the main reference - with the SF site as the alternative.

Dr JTS said...

@Jeffrey:

Yes, agreed, this should just be the normal way of doing research, and not be controversial. But politics has a way of intruding where there is money and vested interests involved.

Dr JTS said...

@Jeffrey:

Yes, agreed, this should just be the normal way of doing research, and not be controversial. But politics has a way of intruding where there is money and vested interests involved.