rss
and evar
are S4 generic functions that
respectively computes the Residual Sum of Squares (RSS)
and explained variance achieved by a model.
The explained variance for a target V
is computed
as:
evar = 1 - RSS/sum v_{ij}^2,
rss(object, ...) S4 (matrix) `rss`(object, target) evar(object, ...) S4 (ANY) `evar`(object, target, ...)
fitted
, rss
or evar
method.rss
in evar
calls.a single numeric value
where RSS is the residual sum of squares.
The explained variance is usefull to compare the performance of different models and their ability to accurately reproduce the original target matrix. Note, however, that a possible caveat is that some models explicitly aim at minimizing the RSS (i.e. maximizing the explained variance), while others do not.
signature(object = "ANY")
: Default
method for evar
.
It requires a suitable rss
method to be defined
for object
, as it internally calls
rss(object, target, ...)
.
signature(object = "matrix")
: Computes
the RSS between a target matrix and its estimate
object
, which must be a matrix of the same
dimensions as target
.
The RSS between a target matrix V
and its estimate
v
is computed as:
RSS = \sum_{i,j} (v_{ij} - V_{ij})^2Internally, the computation is performed using an optimised C++ implementation, that is light in memory usage.
signature(object = "ANY")
: Residual sum
of square between a given target matrix and a model that
has a suitable fitted
method. It is
equivalent to rss(fitted(object), ...)
In the context of NMF, Hutchins et al. (2008) used the
variation of the RSS in combination with the algorithm
from Lee et al. (1999) to estimate the correct number of
basis vectors. The optimal rank is chosen where the graph
of the RSS first shows an inflexion point, i.e. using a
screeplot-type criterium. See section Rank
estimation in nmf
.
Note that this way of estimation may not be suitable for all models. Indeed, if the NMF optimisation problem is not based on the Frobenius norm, the RSS is not directly linked to the quality of approximation of the NMF model. However, it is often the case that it still decreases with the rank.
Hutchins LN, Murphy SM, Singh P and Graber JH (2008).
"Position-dependent motif characterization using
non-negative matrix factorization." _Bioinformatics
(Oxford, England)_, *24*(23), pp. 2684-90. ISSN
1367-4811,
Lee DD and Seung HS (1999). "Learning the parts of
objects by non-negative matrix factorization." _Nature_,
*401*(6755), pp. 788-91. ISSN 0028-0836,
# RSS bewteeen random matrices
x <- rmatrix(20,10, max=50)
y <- rmatrix(20,10, max=50)
rss(x, y)
## [1] 90637
rss(x, x + rmatrix(x, max=0.1))
## [1] 0.7383
# RSS between an NMF model and a target matrix
x <- rmatrix(20, 10)
y <- rnmf(3, x) # random compatible model
rss(y, x)
## [1] 62.7
# fit a model with nmf(): one should do better
y2 <- nmf(x, 3) # default minimizes the KL-divergence
rss(y2, x)
## [1] 8.047
y2 <- nmf(x, 3, 'lee') # 'lee' minimizes the RSS
rss(y2, x)
## [1] 7.359