Outliers.RdComputes outlierness scores and detects outliers.
Outliers(prox, cls=NULL, data=NULL, threshold=10)a proximity matrix (a square matrix with 1 on the diagonal and values between 0 and 1 in the off-diagonal positions).
Factor. The classes the rows in the proximity matrix belong to. If NULL (default), all data are assumed to come from the same class.
A data frame of variables to describe the outliers (optional).
Numeric. The value of outlierness above which an observation is considered an outlier. Default is 10.
The outlierness score of a case is computed as n / sum(squared proximity), normalized by subtracting the median and divided by the MAD, within each class.
A list with the following elements :
numeric vector containing the outlierness scores
numeric vector of indexes of the outliers, or a data frame with the outliers and their characteristics
The code is adapted from outlier function in randomForest package.
data(iris)
iris2 = iris
iris2$Species = factor(iris$Species == "versicolor")
iris.cf = party::cforest(Species ~ ., data = iris2,
control = party::cforest_unbiased(mtry = 2, ntree = 50))
prox=proximity(iris.cf)
Outliers(prox, iris2$Species, iris2[,1:4])
#> $scores
#> 1 2 3 4 5 6
#> -0.74046422 -0.34808119 -0.68180577 -0.66297948 -0.74046422 -0.70328790
#> 7 8 9 10 11 12
#> -0.73846064 -0.73846064 0.56902973 -0.66297948 -0.71977558 -0.72255709
#> 13 14 15 16 17 18
#> -0.34808119 -0.34808119 -0.43518936 -0.52426962 -0.71977558 -0.74046422
#> 19 20 21 22 23 24
#> -0.52426962 -0.74046422 -0.68401304 -0.74046422 -0.74046422 -0.65088637
#> 25 26 27 28 29 30
#> -0.51055244 -0.30845419 -0.72255709 -0.74046422 -0.73846064 -0.66402001
#> 31 32 33 34 35 36
#> -0.64453147 -0.70096146 -0.74046422 -0.66717575 -0.66297948 -0.68180577
#> 37 38 39 40 41 42
#> -0.66717575 -0.74046422 -0.34808119 -0.73846064 -0.74046422 0.68491387
#> 43 44 45 46 47 48
#> -0.68180577 -0.48490007 -0.51259770 -0.34808119 -0.72453739 -0.68180577
#> 49 50 51 52 53 54
#> -0.74046422 -0.71653290 1.78457113 1.21285079 4.40255392 -0.62172762
#> 55 56 57 58 59 60
#> 0.96514494 -0.75460283 2.38323677 0.14171035 -0.34634307 0.19435985
#> 61 62 63 64 65 66
#> 0.14171035 0.77201341 -0.42753212 0.50545642 -0.67556495 0.65062904
#> 67 68 69 70 71 72
#> 0.92322877 -0.52870555 1.12043131 -0.70812432 12.44964399 -0.77458574
#> 73 74 75 76 77 78
#> 4.53600492 -0.14171035 -0.51292796 0.30803573 1.76931464 8.11518903
#> 79 80 81 82 83 84
#> 0.59776230 -0.31869614 -0.55276115 -0.15082347 -0.83470576 8.62457411
#> 85 86 87 88 89 90
#> 1.91532971 2.49980713 1.75001253 -0.42541114 -0.18744830 -0.62172762
#> 91 92 93 94 95 96
#> -0.67341657 0.29624472 -0.86155263 0.14171035 -0.84007496 -0.23001927
#> 97 98 99 100 101 102
#> -0.84284951 -0.60168807 -0.26544199 -0.84284951 0.25397508 0.93965195
#> 103 104 105 106 107 108
#> 0.01437954 1.00791500 -0.01929064 0.01437954 10.95458350 0.85563555
#> 109 110 111 112 113 114
#> 0.90439245 0.21862872 0.39056457 0.22586169 -0.01039710 1.30308384
#> 115 116 117 118 119 120
#> 0.87537239 0.07473864 0.57087483 0.21862872 0.22586169 8.06418721
#> 121 122 123 124 125 126
#> 0.04762001 1.82915849 0.19598723 2.60004598 0.05275366 0.77777175
#> 127 128 129 130 131 132
#> 3.34645872 2.95064323 0.17085179 3.60477351 0.19598723 0.21862872
#> 133 134 135 136 137 138
#> 0.17085179 5.48665460 6.98180163 0.01437954 0.37443820 0.70497412
#> 139 140 141 142 143 144
#> 3.71121083 0.01945522 0.01039710 0.40420287 0.93965195 0.04762001
#> 145 146 147 148 149 150
#> 0.05275366 0.05888676 1.17695241 0.05888676 0.40930102 2.06522594
#>
#> $outliers
#> rowname Sepal.Length Sepal.Width Petal.Length Petal.Width scores
#> 1 71 5.9 3.2 4.8 1.8 12.44964
#> 2 107 4.9 2.5 4.5 1.7 10.95458
#>