Validate new datasets

validate(
  dataset,
  RAVmodel,
  method = "pearson",
  maxFrom = "PC",
  level = "max",
  scale = FALSE
)

Arguments

dataset

Single or a named list of SummarizedExperiment (RangedSummarizedExperiment, ExpressionSet or matrix) object(s). Gene names should be in 'symbol' format. Currently, each dataset should have at least 8 samples.

RAVmodel

PCAGenomicSignatures object.

method

A character string indicating which correlation coefficient is to be computed. One of "pearson" (default), "kendall", or "spearman": can be abbreviated.

maxFrom

Select whether to display the maximum value from dataset's PCs or avgLoadings. Under the default (maxFrom="PC"), the maximum correlation coefficient from top 8 PCs for each avgLoading will be selected as an output. If you choose (maxFrom="avgLoading"), the avgLoading with the maximum correlation coefficient with each PC will be in the output.

level

Output format of validated result. Two options are available: c("max", "all"). Default is "max", which outputs the matrix containing only the maximum coefficient. To get the coefficient of all 8 PCs, set this argument as "all". level = "all" can be used only for one dataset.

scale

Default is FALSE. If it is set to TRUE, dataset will be row normalized.

Value

A data frame containing the maximum pearson correlation coefficient between the top 8 PCs of the dataset and pre-calculated average loadings (in row) of training datasets (score column). It also contains other metadata associated with each RAV: PC for one of the top 8 PCs of the dataset that results in the given score, sw for the average silhouette width of the RAV, cl_size for the size of each RAV.

If the input for dataset argument is a list of different datasets, each row of the output represents a new dataset for test, and each column represents clusters from training datasets. If level = "all", a list containing the matrices of the pearson correlation coefficient between all top 8 PCs of the datasets and avgLoading.

Examples

data(miniRAVmodel)
library(bcellViper)
data(bcellViper)
validate(dset, miniRAVmodel)
#>             score PC           sw cl_size cl_num
#> RAV1076 0.5950767  2 -0.044471242      10   1076
#> RAV338  0.5709072  2 -0.046833188      21    338
#> RAV1467 0.5695904  2 -0.047094024       6   1467
#> RAV1614 0.5308258  2 -0.075672871      13   1614
#> RAV294  0.5130476  2 -0.022418144       6    294
#> RAV3071 0.5100487  2 -0.009615286       6   3071
#> RAV1694 0.5090028  2 -0.055182792      20   1694
#> RAV438  0.5088604  2  0.035822199       6    438
#> RAV725  0.4993058  2  0.094127339      20    725
#> RAV1497 0.4980072  2  0.130408751      12   1497
#> RAV501  0.4897364  2  0.064985019       7    501
#> RAV941  0.4872848  2 -0.020971183       3    941
#> RAV2538 0.5838616  2  0.069961659       4   2538
#> RAV1139 0.5622719  2  0.085369376       4   1139
#> RAV884  0.5451752  2  0.153286975       6    884
#> RAV695  0.2415454  3  0.105812565       2    695
#> RAV953  0.2145322  2 -0.004345963       3    953
#> RAV1994 0.1729988  3 -0.034939005       3   1994
#> RAV312  0.3774798  8 -0.066746154      24    312
#> RAV468  0.3989300  8  0.124024704       7    468
validate(dset, miniRAVmodel, maxFrom = "avgLoading")
#>         score validated_RAV
#> PC1 0.2285693       RAV1139
#> PC2 0.5950767       RAV1076
#> PC3 0.2415454        RAV695
#> PC4 0.2282201        RAV338
#> PC5 0.2129687       RAV1139
#> PC6 0.2126171        RAV338
#> PC7 0.1762712        RAV468
#> PC8 0.3989300        RAV468