r - Weighting k Means Clustering by number of observations -
i cluster data using k means in r looks follows.
adp ns cntr pp2v eml pp1v addps fb pp1d adr isv pp2d adsem sumall conv 2 0 0 1 0 0 0 0 0 12 0 12 0 53 0 2 0 0 1 0 0 0 0 0 14 0 25 0 53 0 2 0 0 1 0 0 0 0 0 15 0 0 0 53 0 2 0 0 1 0 0 0 0 0 15 0 4 0 53 0 2 0 0 1 0 0 0 0 0 17 0 0 0 53 0 2 0 0 1 0 0 0 0 0 18 0 0 0 106 0 2 0 0 1 0 0 0 0 0 23 0 10 0 53 0 2 0 0 1 0 0 1 0 0 0 0 1 0 106 0 2 0 0 1 0 0 3 0 0 0 0 0 0 53 0 2 0 0 2 0 0 0 0 0 0 0 0 0 3922 0 2 0 0 2 0 0 0 0 0 0 0 1 0 530 0 2 0 0 2 0 0 0 0 0 0 0 2 0 954 0 2 0 0 2 0 0 0 0 0 0 0 3 0 477 0 2 0 0 2 0 0 0 0 0 0 0 4 0 265 0 2 0 0 2 0 0 0 0 0 0 0 5 0 742 0 2 0 0 2 0 0 0 0 0 0 0 6 0 265 0 2 0 0 2 0 0 0 0 0 0 0 7 0 265 0
the column "sumall" number of times particular combination of variables observed in data.
so when using k means able use column 'weight' particular combination frequent combinations more importance (also cluster features given weighted averages).
i can't see simple way in standard cluster
package, can advise on whether there simple way this?
since sumall
number of times particular observation occurred, create new dataset each row replicated correct number of times, , clustering new dataset.
here's simple example of expanding dataset replicate rows
df<-data.frame(a=c(1,2,3,4),b=c(4,5,6,7),c=c(7,8,9,9),sumall=c(2,6,4,1)) b c sumall 1 1 4 7 2 2 2 5 8 6 3 3 6 9 4 4 4 7 9 1
then need expand df
replicating rows according sumall
df_expanded<-df[rep(seq_len(nrow(df)),df$sumall),] b c sumall 1 1 4 7 2 1.1 1 4 7 2 2 2 5 8 6 2.1 2 5 8 6 2.2 2 5 8 6 2.3 2 5 8 6 2.4 2 5 8 6 2.5 2 5 8 6 3 3 6 9 4 3.1 3 6 9 4 3.2 3 6 9 4 3.3 3 6 9 4 4 4 7 9 1
then use favorite clustering method.
Comments
Post a Comment