r - Weighting k Means Clustering by number of observations -


i cluster data using k means in r looks follows.

adp ns  cntr    pp2v    eml pp1v    addps   fb  pp1d    adr isv pp2d    adsem   sumall  conv 2   0   0   1   0   0   0   0   0   12  0   12  0   53  0 2   0   0   1   0   0   0   0   0   14  0   25  0   53  0 2   0   0   1   0   0   0   0   0   15  0   0   0   53  0 2   0   0   1   0   0   0   0   0   15  0   4   0   53  0 2   0   0   1   0   0   0   0   0   17  0   0   0   53  0 2   0   0   1   0   0   0   0   0   18  0   0   0   106 0 2   0   0   1   0   0   0   0   0   23  0   10  0   53  0 2   0   0   1   0   0   1   0   0   0   0   1   0   106 0 2   0   0   1   0   0   3   0   0   0   0   0   0   53  0 2   0   0   2   0   0   0   0   0   0   0   0   0   3922    0 2   0   0   2   0   0   0   0   0   0   0   1   0   530 0 2   0   0   2   0   0   0   0   0   0   0   2   0   954 0 2   0   0   2   0   0   0   0   0   0   0   3   0   477 0 2   0   0   2   0   0   0   0   0   0   0   4   0   265 0 2   0   0   2   0   0   0   0   0   0   0   5   0   742 0 2   0   0   2   0   0   0   0   0   0   0   6   0   265 0 2   0   0   2   0   0   0   0   0   0   0   7   0   265 0 

the column "sumall" number of times particular combination of variables observed in data.

so when using k means able use column 'weight' particular combination frequent combinations more importance (also cluster features given weighted averages).

i can't see simple way in standard cluster package, can advise on whether there simple way this?

since sumall number of times particular observation occurred, create new dataset each row replicated correct number of times, , clustering new dataset.

here's simple example of expanding dataset replicate rows

df<-data.frame(a=c(1,2,3,4),b=c(4,5,6,7),c=c(7,8,9,9),sumall=c(2,6,4,1))   b c sumall 1 1 4 7      2 2 2 5 8      6 3 3 6 9      4 4 4 7 9      1 

then need expand df replicating rows according sumall

df_expanded<-df[rep(seq_len(nrow(df)),df$sumall),]  b c sumall 1   1 4 7      2 1.1 1 4 7      2 2   2 5 8      6 2.1 2 5 8      6 2.2 2 5 8      6 2.3 2 5 8      6 2.4 2 5 8      6 2.5 2 5 8      6 3   3 6 9      4 3.1 3 6 9      4 3.2 3 6 9      4 3.3 3 6 9      4 4   4 7 9      1 

then use favorite clustering method.


Comments

Popular posts from this blog

java - Oracle EBS .ClassNotFoundException: oracle.apps.fnd.formsClient.FormsLauncher.class ERROR -

c# - how to use buttonedit in devexpress gridcontrol -

How do you convert a timestamp into a datetime in python with the correct timezone? -