Ranking according to value across two variables - r -


i have dataframe:

   df<-data.frame(       var1 = c(rep(c(rep(1,2), rep(2,3), rep(3,2), rep(4,1)),2), 1),       var2 = c(rep(1,8), rep(2,8),3)     )   df      var1 var2 #1     1    1 #2     1    1 #3     2    1 #4     2    1 #5     2    1 #6     3    1 #7     3    1 #8     4    1 #9     1    2 #10    1    2 #11    2    2 #12    2    2 #13    2    2 #14    3    2 #15    3    2 #16    4    2 #17    1    3 

i make third variable rank. rows highest rank if 1) have lowest numbers in var2 - , according how low numbers in var1. e.g. rows 1 , 2 var2=1 , var1=1 should ranked 1. whereas, rows 9 , 10 var2=2 , var1=1 ranked 5.

if data arranged in ascending order of var2 , var1, did following using favorite r function rle achieve ranking i'm after:

rle(df$var1) n <- length(rle(df$var1)$lengths) df$ranks  <- rep(1:n, rle(df$var1)$lengths)  df      var1 var2 ranks #1     1    1     1 #2     1    1     1 #3     2    1     2 #4     2    1     2 #5     2    1     2 #6     3    1     3 #7     3    1     3 #8     4    1     4 #9     1    2     5 #10    1    2     5 #11    2    2     6 #12    2    2     6 #13    2    2     6 #14    3    2     7 #15    3    2     7 #16    4    2     8 #17    1    3     9 

this works, requires df pre-sorted. i'd solution not need this. feel should simple one-liner using rank , i'm having blind-spot. appreciated - thanks.

edit 1:
- adding larger example testing of suggested answer

dput(df1) df1 <- structure(list(var1 = c(1l, 1l, 1l, 1l, 1l, 2l, 2l, 2l, 2l, 2l,  3l, 3l, 3l, 3l, 3l, 4l, 4l, 4l, 4l, 4l, 5l, 5l, 5l, 5l, 5l, 7l,  7l, 7l, 7l, 7l, 8l, 8l, 8l, 8l, 8l, 6l, 9l, 10l, 10l, 10l, 11l,  12l, 12l, 12l, 13l, 14l, 14l, 14l, 14l, 15l, 16l, 16l, 16l, 16l,  16l, 17l, 17l, 17l, 17l, 17l, 18l, 18l, 18l, 18l, 18l, 19l, 19l,  20l, 20l, 21l, 22l, 22l, 22l, 22l, 22l, 23l, 23l, 23l, 23l, 23l,  24l, 24l, 24l, 24l, 24l, 25l, 25l, 25l, 25l, 25l, 1l, 2l, 2l,  2l, 2l, 4l, 5l, 5l, 5l, 5l, 6l, 6l, 6l, 6l, 6l, 7l, 7l, 7l, 7l,  8l, 9l, 10l, 10l, 10l, 10l, 3l, 11l, 11l, 11l, 11l, 12l, 13l,  13l, 13l, 13l, 14l, 14l, 14l, 14l, 14l, 15l, 15l, 15l, 15l, 15l,  12l, 16l, 16l, 16l, 16l, 17l, 17l, 17l, 17l, 17l, 18l, 18l, 18l,  18l, 18l, 19l, 19l, 19l, 19l, 19l, 20l, 20l, 20l, 20l, 21l, 22l,  22l, 22l, 23l, 25l, 24l, 24l, 24l, 24l, 24l, 26l, 26l, 26l, 26l,  26l, 27l, 27l, 27l, 27l, 27l, 1l, 2l, 2l, 2l, 2l, 3l, 3l, 3l,  3l, 3l, 4l, 4l, 4l, 4l, 4l, 5l, 5l, 5l, 5l, 5l, 6l, 6l, 6l, 6l,  6l, 7l, 7l, 7l, 7l, 7l, 8l, 9l, 9l, 9l, 9l, 10l, 10l, 10l, 10l,  11l, 12l, 12l, 13l, 14l, 15l, 16l, 17l, 17l, 18l, 18l, 19l, 19l,  19l, 19l, 20l, 21l, 21l, 21l, 21l, 21l, 22l, 22l, 22l, 22l, 22l,  23l, 23l, 23l, 23l, 23l, 24l, 24l, 24l, 24l, 24l, 25l, 25l, 25l,  25l, 25l, 26l, 26l, 26l, 27l, 27l, 28l, 28l, 28l, 28l, 28l, 1l,  1l, 1l, 1l, 1l, 2l, 2l, 2l, 2l, 2l, 3l, 3l, 3l, 3l, 3l, 4l, 4l,  4l, 4l, 5l, 6l, 7l, 7l, 7l, 7l, 8l, 8l, 8l, 8l, 8l), var2 = c(1l,  1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l,  1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l,  1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l,  1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l,  1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l,  1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 2l, 2l, 2l, 2l, 2l, 2l, 2l,  2l, 2l, 2l, 2l, 2l, 2l, 2l, 2l, 2l, 2l, 2l, 2l, 2l, 2l, 2l, 2l,  2l, 2l, 2l, 2l, 2l, 2l, 2l, 2l, 2l, 2l, 2l, 2l, 2l, 2l, 2l, 2l,  2l, 2l, 2l, 2l, 2l, 2l, 2l, 2l, 2l, 2l, 2l, 2l, 2l, 2l, 2l, 2l,  2l, 2l, 2l, 2l, 2l, 2l, 2l, 2l, 2l, 2l, 2l, 2l, 2l, 2l, 2l, 2l,  2l, 2l, 2l, 2l, 2l, 2l, 2l, 2l, 2l, 2l, 2l, 2l, 2l, 2l, 2l, 2l,  2l, 2l, 2l, 3l, 3l, 3l, 3l, 3l, 3l, 3l, 3l, 3l, 3l, 3l, 3l, 3l,  3l, 3l, 3l, 3l, 3l, 3l, 3l, 3l, 3l, 3l, 3l, 3l, 3l, 3l, 3l, 3l,  3l, 3l, 3l, 3l, 3l, 3l, 3l, 3l, 3l, 3l, 3l, 3l, 3l, 3l, 3l, 3l,  3l, 3l, 3l, 3l, 3l, 3l, 3l, 3l, 3l, 3l, 3l, 3l, 3l, 3l, 3l, 3l,  3l, 3l, 3l, 3l, 3l, 3l, 3l, 3l, 3l, 3l, 3l, 3l, 3l, 3l, 3l, 3l,  3l, 3l, 3l, 3l, 3l, 3l, 3l, 3l, 3l, 3l, 3l, 3l, 3l, 4l, 4l, 4l,  4l, 4l, 4l, 4l, 4l, 4l, 4l, 4l, 4l, 4l, 4l, 4l, 4l, 4l, 4l, 4l,  4l, 4l, 4l, 4l, 4l, 4l, 4l, 4l, 4l, 4l, 4l), ranks = c(1l, 1l,  1l, 1l, 1l, 12l, 12l, 12l, 12l, 12l, 19l, 19l, 19l, 19l, 19l,  20l, 20l, 20l, 20l, 20l, 21l, 21l, 21l, 21l, 21l, 23l, 23l, 23l,  23l, 23l, 24l, 24l, 24l, 24l, 24l, 22l, 25l, 2l, 2l, 2l, 3l,  4l, 4l, 4l, 5l, 6l, 6l, 6l, 6l, 7l, 8l, 8l, 8l, 8l, 8l, 9l, 9l,  9l, 9l, 9l, 10l, 10l, 10l, 10l, 10l, 11l, 11l, 13l, 13l, 14l,  15l, 15l, 15l, 15l, 15l, 16l, 16l, 16l, 16l, 16l, 17l, 17l, 17l,  17l, 17l, 18l, 18l, 18l, 18l, 18l, 26l, 37l, 37l, 37l, 37l, 47l,  48l, 48l, 48l, 48l, 49l, 49l, 49l, 49l, 49l, 50l, 50l, 50l, 50l,  51l, 52l, 27l, 27l, 27l, 27l, 46l, 28l, 28l, 28l, 28l, 29l, 30l,  30l, 30l, 30l, 31l, 31l, 31l, 31l, 31l, 32l, 32l, 32l, 32l, 32l,  29l, 33l, 33l, 33l, 33l, 34l, 34l, 34l, 34l, 34l, 35l, 35l, 35l,  35l, 35l, 36l, 36l, 36l, 36l, 36l, 38l, 38l, 38l, 38l, 39l, 40l,  40l, 40l, 41l, 43l, 42l, 42l, 42l, 42l, 42l, 44l, 44l, 44l, 44l,  44l, 45l, 45l, 45l, 45l, 45l, 53l, 64l, 64l, 64l, 64l, 74l, 74l,  74l, 74l, 74l, 75l, 75l, 75l, 75l, 75l, 76l, 76l, 76l, 76l, 76l,  77l, 77l, 77l, 77l, 77l, 78l, 78l, 78l, 78l, 78l, 79l, 80l, 80l,  80l, 80l, 54l, 54l, 54l, 54l, 55l, 56l, 56l, 57l, 58l, 59l, 60l,  61l, 61l, 62l, 62l, 63l, 63l, 63l, 63l, 65l, 66l, 66l, 66l, 66l,  66l, 67l, 67l, 67l, 67l, 67l, 68l, 68l, 68l, 68l, 68l, 69l, 69l,  69l, 69l, 69l, 70l, 70l, 70l, 70l, 70l, 71l, 71l, 71l, 72l, 72l,  73l, 73l, 73l, 73l, 73l, 81l, 81l, 81l, 81l, 81l, 82l, 82l, 82l,  82l, 82l, 83l, 83l, 83l, 83l, 83l, 84l, 84l, 84l, 84l, 85l, 86l,  87l, 87l, 87l, 87l, 88l, 88l, 88l, 88l, 88l)), .names = c("var1",  "var2", "ranks"), row.names = c(na, -300l), class = "data.frame") 

the ranks variable got suggested answer:

df1$ranks1 <- dense_rank(paste(df1$var2, df1$var1)) 

solutions involving paste[0] work if values within each vector integer fixed number of digits. because paste converts character and:

  1. character (lexicographic) ordering differs numeric: rank(c(1 , 2, 11)); rank(as.character(c(1 , 2, 11)))
  2. concatenation introduces ambiguities: paste0(2,12); paste0(21,2)

peter dalgaard made relevant post in 2011 http://r.789695.n4.nabble.com/function-rank-for-data-frames-or-multiple-vectors-td3765685.html

for ignore identical rows. note rank(x) == order(order(x)) , order accepts multiple ordering columns if don't mind how identical rows split order(order(df$var2, df$var1)) job.

this splits identical rows according original ordering. there number of ways of ranking identical rows http://en.wikipedia.org/wiki/ranking#strategies_for_assigning_rankings.

in 2011 peter dalgaard suggested ave(order(order(df$var2, df$var1)), df$var2, df$var1) gives wikipedia calls 'fractional ranking' , in base::rank default ties.method="average".

your example wikipedia calls 'dense ranking' isn't available in base::rank - commented david arenburg - provided dplyr::dense_rank, can library(dyplr) , use: dense_rank(ave(order(order(df$var2, df$var1)), df$var2, df$var1))

looking @ code dense_rank

function (x)  {     r <- rank(x)     match(r, sort(unique(r))) } 

suggesting if don't want load dplyr , happy 2-statement solution creating variable - e.g. r - use

r <- ave(order(order(df$var2, df$var1)), df$var2, df$var1); match(r, sort(unique(r)))

edited add ...

you can make things little neater realising data frame list rank columns in order of occurrence:

dense_rank(ave(order(do.call(order, df)), df)) 

you ranking columns in reverse order of occurrence so

dense_rank(ave(order(do.call(order, rev(df))), df)) 

or explicitly specifying columns , order

dense_rank(ave(order(do.call(order, df[,2:1])), df[,2:1])) 

Comments

Popular posts from this blog

java - Oracle EBS .ClassNotFoundException: oracle.apps.fnd.formsClient.FormsLauncher.class ERROR -

c# - how to use buttonedit in devexpress gridcontrol -

How do you convert a timestamp into a datetime in python with the correct timezone? -