Thursday, May 21, 2009

Integrating Nexml into Bioperl (1st Post)

This summer I'll be working with Mark A. Jensen and Rutger Vos to give native Nexml functionality to Bioperl. The abstract and background follow to give an idea of the project. Check back for updates!

Abstract:
This project will integrate the NeXML exchange standard into BioPerl, facilitating the adoption of this standard and easing the transition from the overworked NEXUS standard. A wrapper will be used to allow BioPerl native access to the preferred NeXML parser (Bio::Phylo), allowing Bio::Phylo and NeXML to co-evolve without being encumbered by BioPerl. Additionally, test cases and example sets will be developed that target real world uses.

Background:
Phylogenetic trees, like all visual data representations, require special consideration to represent the complete structure in a linear non-visual data format. The current accepted standard is the NEXUS file format, which is a highly expressive and popular file format. However, it has become increasingly cumbersome to use due to three main reasons: “(i) deficiencies in the standard itself, (ii) variable level of support for standard feature in current software, and (iii) an extremely high incidence of illegal files (a byproduct of the fact that most parsers are ‘forgiving’ of errors)”1. These problems have sparked a transition away from the NEXUS standard to a more flexible, extensible, and strictly defined standard. One candidate is the NeXML exchange standard, which, in addition to satisfying the goals of being more extensible and more strictly defined, leverages the power of pre-existing XML tools and capabilities (such as parser libraries, web service toolkits, and serialization) to make a more robust product2. One of the current barriers of adopting the NeXML standard into wider use is the lack of support for NeXML in popular programming tools. One way to lower the entrance barrier for use of the new NeXML standard is to incorporate the preferred NeXML parser (Bio::Phylo) into BioPerl - a popular bioinformatics framework. However, the NeXML standard is rapidly evolving and would be hampered by having to interact directly with the large BioPerl project. To circumvent these potential pitfalls, a lightweight dynamic wrapper will be created around Bio::Phylo. In this way BioPerl users will be given access to the full suite of functionality provided by Bio::Phylo without encumbering the future development of the NeXML standard or the Bio::Phylo module. To facilitate the use of the newly designed wrapper and the NeXML standard, test cases and example sets will be developed that target real world uses. This project will be an important step in the transition away from an overworked standard to the more powerful and extensible NeXML exchange standard, inherently increasing productivity along the way.