Residual Sum of Squares and Explained Variance

Description

rss and evar are S4 generic functions that respectively computes the Residual Sum of Squares (RSS) and explained variance achieved by a model.

The explained variance for a target V is computed as:

evar = 1 - RSS/sum v_{ij}^2,



Usage

rss(object, ...)

S4 (matrix)
rss(object, target)

evar(object, ...)

S4 (ANY)
evar(object, target, ...)

Arguments

object
an R object with a suitable fitted, rss or evar method.
...
extra arguments to allow extension, e.g. passed to rss in evar calls.
target
target matrix

Value

a single numeric value

Details

where RSS is the residual sum of squares.

The explained variance is usefull to compare the performance of different models and their ability to accurately reproduce the original target matrix. Note, however, that a possible caveat is that some models explicitly aim at minimizing the RSS (i.e. maximizing the explained variance), while others do not.

Methods

1. evarsignature(object = "ANY"): Default method for evar.

It requires a suitable rss method to be defined for object, as it internally calls rss(object, target, ...).

2. rsssignature(object = "matrix"): Computes the RSS between a target matrix and its estimate object, which must be a matrix of the same dimensions as target.

The RSS between a target matrix V and its estimate v is computed as:

RSS = \sum_{i,j} (v_{ij} -
V_{ij})^2

Internally, the computation is performed using an
optimised C++ implementation, that is light in memory
usage.


3. rsssignature(object = "ANY"): Residual sum of square between a given target matrix and a model that has a suitable fitted method. It is equivalent to rss(fitted(object), ...)

In the context of NMF, Hutchins et al. (2008) used the variation of the RSS in combination with the algorithm from Lee et al. (1999) to estimate the correct number of basis vectors. The optimal rank is chosen where the graph of the RSS first shows an inflexion point, i.e. using a screeplot-type criterium. See section Rank estimation in nmf.

Note that this way of estimation may not be suitable for all models. Indeed, if the NMF optimisation problem is not based on the Frobenius norm, the RSS is not directly linked to the quality of approximation of the NMF model. However, it is often the case that it still decreases with the rank.

References

Hutchins LN, Murphy SM, Singh P and Graber JH (2008). "Position-dependent motif characterization using non-negative matrix factorization." _Bioinformatics (Oxford, England)_, *24*(23), pp. 2684-90. ISSN 1367-4811, , .

Lee DD and Seung HS (1999). "Learning the parts of objects by non-negative matrix factorization." _Nature_, *401*(6755), pp. 788-91. ISSN 0028-0836, , .

Examples


x <- rmatrix(20,10, max=50)
y <- rmatrix(20,10, max=50)

## [1] 90637

rss(x, x + rmatrix(x, max=0.1))

## [1] 0.7383

# RSS between an NMF model and a target matrix
x <- rmatrix(20, 10)
y <- rnmf(3, x) # random compatible model

## [1] 62.7


# fit a model with nmf(): one should do better
y2 <- nmf(x, 3) # default minimizes the KL-divergence

## [1] 8.047

y2 <- nmf(x, 3, 'lee') # 'lee' minimizes the RSS

## [1] 7.359