25 January 2013
By Darren Logan
Magnus Manske, Head of Informatics in the Malaria Programme at the Sanger Institute, uses the techniques and approach he learnt helping to build Wikipedia to help build the LookSeq software that allows researchers to visualise DNA sequence. Credit: Genome Research Limited
Every 25 January, Wikipedians around the world celebrate Magnus Manske Day. Why was the contribution of one of the Sanger Institute’s malaria researchers so important that Jimmy Wales declared the first Wikipedia Day in honour of him?
I’m a keen Wikipedian (most users of the site know me as Rockpocket) and a member of Faculty at the Sanger Institute. So when I first arrived at the Institute a few years ago, I was astonished to learn that the Magnus Manske was among my fellow researchers. The man who helped create the foundations of Wikipedia is now working as Head of Informatics in the Malaria Programme.
Back at the beginning of the 21st Century Magnus was a PhD student studying biology at the University of Cologne. As an avid reader of computer magazines, he became aware of a fledgling website called Nupedia. Its lofty aim, to be a comprehensive open access online encyclopaedia of peer-reviewed information, intrigued him. Keen to contribute, Magnus wrote some of the first biology articles for the site.
Unfortunately, Nupedia’s strength was also its weakness: its rigorous expert review system meant months often passed before submissions were published on the live site, a rather frustrating process. To kick start the creation of articles, the collaborators launched a side project that enabled anyone to add new content to an encyclopaedia immediately. Its name was Wikipedia.
Magnus was immediately hooked and began writing articles. Just, he tells me, “for the fun of it.” He was among the very earliest contributors to Wikipedia, and was responsible for creating articles on many fundamental scientific topics including plasmid, Charles Darwin, and polymerase chain reaction. For this alone I, along with many others, owe Magnus a debt of thanks. But he didn’t stop there.
Wikipedia had a fundamental flaw too. Its strength in allowing world-wide collaboration to create and edit thousands of pages simultaneously was also its weakness. Within a few months, it began to struggle under the ever increasing numbers of articles and contributors. Running on a single server and storing its articles in flat text files, the site’s basic Perl scripts couldn’t to cope with the sheer volume of edits. Wikipedia was a victim of its own success and the whole project was in danger of stalling.
Magnus, as he tells me modestly, found he had some spare time during his semester holiday, so he decided to write an entirely new program to put Wikipedia on a footing that would allow it to grow in the future. Without being asked, he created a system put all the information into a stable, scalable backend database. Not only that, but he also wanted to add a number of new features that would be useful for organising the different kinds of pages on the site.
Why did he do it? He says he saw the need, thought he could help, and wanted to learn a new programming language called PHP that he hadn’t used before.
Remarkably, it took him just two weeks. He submitted his program to Jimmy Wales and the rest of the Wikipedia team and they spent another six months testing and refining his software. Almost a year after Wikpedia first launched, and 11 years ago today, the site migrated across to Magnus’ free, open source software – now known as Mediawiki.
Mangus applies a similar ethos to his work in helping scientists understand the genetics of malaria: by writing open source, freely-available software that is vital for others to conduct their genetics research. One such tool is LookSeq, an application for that many researchers around the world –myself included – regularly use to visualise and understand changes in DNA sequences. Not only does the tool allow me to benefit from data generously collected and analysed by others, but it also does it in a clever way that minimises server load: ensuring that the system is well fitted for handling increasing amounts of sequence data.
Fittingly, the version of Mediawiki used to power Wikipedia today probably doesn’t contain a single line of Magnus’ original code. Just like Wikipedia’s articles, his open source software has been continuously modified and improved by many collaborators. Nevertheless, if you are one of the millions who have benefited from consulting Wikipedia, or being able to browse the Sanger Institute’s genomic data, today please join me in raising a virtual toast to Magnus.
Darren Logan is a member of Faculty at the Wellcome Trust Sanger Institute, leading the Genetics of instinctive behaviour group more...