The function `featureScore`

implements different
methods to computes basis-specificity scores for each
feature in the data.

The function `extractFeatures`

implements different
methods to select the most basis-specific features of
each basis component.

featureScore(object, ...) S4 (matrix) `featureScore`(object, method = c("kim", "max")) extractFeatures(object, ...) S4 (matrix) `extractFeatures`(object, method = c("kim", "max"), format = c("list", "combine", "subset"), nodups = TRUE)

- object
- an object from which scores/features are computed/extracted
- ...
- extra arguments to allow extension
- method
- scoring or selection method. It specifies
the name of one of the method described in sections
*Feature scores*and*Feature selection*. Additionally for`extractFeatures`

, it may be an integer vector that indicates the number of top most contributing features to extract from each column of`object`

, when ordered in decreasing order, or a numeric value between 0 and 1 that indicates the minimum relative basis contribution above which a feature is selected (i.e. basis contribution threshold). In the case of a single numeric value (integer or percentage), it is used for all columns. Note that`extractFeatures(x, 1)`

means relative contribution threshold of 100%, to select the top contributing features one must explicitly specify an integer value as in`extractFeatures(x, 1L)`

. However, if all elements in methods are > 1, they are automatically treated as if they were integers:`extractFeatures(x, 2)`

means the top-2 most contributing features in each component. - format
- output format. The following values are
accepted:
- ‘list’(default)
returns a list with one element per column in
`object`

, each containing the indexes of the selected features, as an integer vector. If`object`

has row names, these are used to name each index vector. Components for which no feature were selected are assigned a`NA`

value. - ‘combine’ returns all indexes in a single
vector. Duplicated indexes are made unique if
`nodups=TRUE`

(default). - ‘subset’ returns an object of the same
class as
`object`

, but subset with the selected indexes, so that it contains data only from basis-specific features.

- ‘list’(default)
returns a list with one element per column in
- nodups
- logical that indicates if duplicated
indexes, i.e. features selected on multiple basis
components (which should in theory not happen), should be
only appear once in the result. Only used when
`format='combine'`

.

`featureScore`

returns a numeric vector of the
length the number of rows in `object`

(i.e. one
score per feature).

`extractFeatures`

returns the selected features as a
list of indexes, a single integer vector or an object of
the same class as `object`

that only contains the
selected features.

One of the properties of Nonnegative Matrix Factorization is that is tend to produce sparse representation of the observed data, leading to a natural application to bi-clustering, that characterises groups of samples by a small number of features.

In NMF models, samples are grouped according to the basis
components that contributes the most to each sample, i.e.
the basis components that have the greatest coefficient
in each column of the coefficient matrix (see
`predict,NMF-method`

). Each group of samples
is then characterised by a set of features selected based
on basis-specifity scores that are computed on the basis
matrix.

- extractFeatures
`signature(object = "matrix")`

: Select features on a given matrix, that contains the basis component in columns. - extractFeatures
`signature(object = "NMF")`

: Select basis-specific features from an NMF model, by applying the method`extractFeatures,matrix`

to its basis matrix. - featureScore
`signature(object = "matrix")`

: Computes feature scores on a given matrix, that contains the basis component in columns. - featureScore
`signature(object = "NMF")`

: Computes feature scores on the basis matrix of an NMF model.

The function `featureScore`

can compute
basis-specificity scores using the following methods:

- ‘kim’ Method defined by Kim et al. (2007).
The score for feature

`i`

is defined as:S_i = 1 + 1/log2(k) sum_q [ p(i,q) log2( p(i,q) ) ] ,

where

`p(i,q)`

is the probability that the`i`

-th feature contributes to basis`q`

:p(i,q) = W(i,q) / (sum_r W(i,r))

The feature scores are real values within the range [0,1]. The higher the feature score the more basis-specific the corresponding feature.

- ‘max’Method defined by
Carmona-Saez et al. (2006).
The feature scores are defined as the row maximums.

The function `extractFeatures`

can select features
using the following methods:

- ‘kim’ uses Kim et al. (2007) scoring schema
and feature selection method.
The features are first scored using the function

`featureScore`

with method ‘kim’. Then only the features that fulfil both following criteria are retained:- score greater than
`\hat{\mu} + 3 \hat{\sigma}`

, where`\hat{\mu}`

and`\hat{\sigma}`

are the median and the median absolute deviation (MAD) of the scores respectively; - the maximum contribution to a basis component is greater than the median of all contributions (i.e. of all elements of W).

- score greater than
- ‘max’ uses the selection method used in
the
`bioNMF`

software package and described in Carmona-Saez et al. (2006).For each basis component, the features are first sorted by decreasing contribution. Then, one selects only the first consecutive features whose highest contribution in the basis matrix is effectively on the considered basis.

Kim H and Park H (2007). "Sparse non-negative matrix
factorizations via alternating non-negativity-constrained
least squares for microarray data analysis."
_Bioinformatics (Oxford, England)_, *23*(12), pp.
1495-502. ISSN 1460-2059,

Carmona-Saez P, Pascual-Marqui RD, Tirado F, Carazo JM
and Pascual-Montano A (2006). "Biclustering of gene
expression data by Non-smooth Non-negative Matrix
Factorization." _BMC bioinformatics_, *7*, pp. 78. ISSN
1471-2105,