NMF: A flexible R package for Nonnegative Matrix Factorization
BMC Bioinformatics 2010, 11:367 - doi:10.1186/1471-2105-11-367 - PMCID: PMC2912887

Renaud Gaujoux¹, Cathal Seoighe*²

(1) Institute of Infectious Disease and Molecular Medicine, University of Cape Town, Observatory 7925, South Africa
(2) School of Mathematics, Statistics and Applied Mathematics, National University of Ireland Galway, Ireland
* Corresponding author

Authors: Renaud Gaujoux - <renaud at cbio dot uct dot ac dot za>; Cathal Seoighe - <cathal dot seoighe at nuigalway dot ie>

Abstract

Background: Nonnegative Matrix Factorization (NMF) is an unsupervised learning technique that has been applied successfully in several fields, including signal processing, face recognition and text mining. Recent applications of NMF in bioinformatics have demonstrated its ability to extract meaningful information from high-dimensional data such as gene expression microarrays. Developments in NMF theory and applications have resulted in a variety of algorithms and methods. However, most NMF implementations have been on commercial platforms, while those that are freely available typically require programming skills. This limits their use by the wider research community.

Results: Our objective is to provide the bioinformatics community with an open-source, easy-to-use and unified interface to standard NMF algorithms, as well as with a simple framework to help implement and test new NMF methods. For that purpose, we have developed a package for the R/BioConductor platform. The package ports public code to R, and is structured to enable users to easily modify and/or add algorithms. It includes a number of published NMF algorithms and initialization methods and facilitates the combination of these to produce new NMF strategies. Commonly used benchmark data and visualization methods are provided to help in the comparison and interpretation of the results.

Conclusions: The NMF package helps realize the potential of Nonnegative Matrix Factorization, especially in bioinformatics, providing easy access to methods that have already yielded new insights in many applications. Documentation, source code and sample data are available from CRAN at http://cran.r-project.org.

Acknowledgements

Funding: This work was funded by the South-African National Bioinformatics Network. Cathal Seoighe is funded through Science Foundation Ireland (Grant number 07/SK/M1211b).