r - prcomp and ggbiplot: invalid 'rot' value -
i'm trying pca analysis of data using r, , found this nice guide, using prcomp
, ggbiplot
. data 2 sample types 3 biological replicates each (i.e. 6 rows) , around 20000 genes (i.e. variables). first, getting pca model code described in guide doesn't work:
>pca=prcomp(data,center=t,scale.=t) error in prcomp.default(data, center = t, scale. = t) : cannot rescale constant/zero column unit variance
however, if remove scale. = t
part, works fine , model. why this, , cause of error below?
> summary(pca) importance of components: pc1 pc2 pc3 pc4 pc5 standard deviation 4662.8657 3570.7164 2717.8351 1419.3137 819.15844 proportion of variance 0.4879 0.2861 0.1658 0.0452 0.01506 cumulative proportion 0.4879 0.7740 0.9397 0.9849 1.00000
secondly, plotting pca. using basic code, error , empty plot image:
> ggbiplot(pca) error: invalid 'rot' value
what mean, , how can fix it? have (non)scale in making pca, or different? must data, think, since if use standard example code (below) nice pca plot.
> data(wine) > wine.pca=prcomp(wine,scale.=t) > print(ggbiplot(wine.pca, obs.scale = 1, var.scale = 1, groups = wine.class, ellipse = true, circle = true))
[edit 1] have tried subsetting data in 2 ways: 1) remove columns rows 0, , 2) remove columns rows 0. first subsetting still gives me scale
error, not ones have removed columns 0's. why this? how affect pca?
also, tried doing using normal biplot
command both original data (non-scaled) , subsetted data above, , works in both cases. it's with ggbiplot
?
[edit 2] have uploaded subset of data gives me error when don't remove zeroes , works when do. haven't used gist before, think this it. or this...
after transposing data, able replicate error. first error primary problem. pca seeks maximize variance of each component important doesn't focus on 1 variable may have high variance. first error:
error in prcomp.default(tdf, center = t, scale. = t) : cannot rescale constant/zero column unit variance
this telling of variables have 0 variance (i.e. no variability). seeing how pca trying group things maximizing variance there no point in retaining these variables. can removed following call:
df_f <- data[,apply(data, 2, var, na.rm=true) != 0]
once filter, remaining calls work appropriately
pca=prcomp(df_f,center=t,scale.=t) ggbiplot(pca)
Comments
Post a Comment