Ranking according to value across two variables - r -
i have dataframe:
df<-data.frame( var1 = c(rep(c(rep(1,2), rep(2,3), rep(3,2), rep(4,1)),2), 1), var2 = c(rep(1,8), rep(2,8),3) ) df var1 var2 #1 1 1 #2 1 1 #3 2 1 #4 2 1 #5 2 1 #6 3 1 #7 3 1 #8 4 1 #9 1 2 #10 1 2 #11 2 2 #12 2 2 #13 2 2 #14 3 2 #15 3 2 #16 4 2 #17 1 3
i make third variable rank. rows highest rank if 1) have lowest numbers in var2
- , according how low numbers in var1
. e.g. rows 1 , 2 var2=1 , var1=1 should ranked 1. whereas, rows 9 , 10 var2=2 , var1=1 ranked 5.
if data arranged in ascending order of var2 , var1, did following using favorite r function rle
achieve ranking i'm after:
rle(df$var1) n <- length(rle(df$var1)$lengths) df$ranks <- rep(1:n, rle(df$var1)$lengths) df var1 var2 ranks #1 1 1 1 #2 1 1 1 #3 2 1 2 #4 2 1 2 #5 2 1 2 #6 3 1 3 #7 3 1 3 #8 4 1 4 #9 1 2 5 #10 1 2 5 #11 2 2 6 #12 2 2 6 #13 2 2 6 #14 3 2 7 #15 3 2 7 #16 4 2 8 #17 1 3 9
this works, requires df
pre-sorted. i'd solution not need this. feel should simple one-liner using rank
, i'm having blind-spot. appreciated - thanks.
edit 1:
- adding larger example testing of suggested answer
dput(df1) df1 <- structure(list(var1 = c(1l, 1l, 1l, 1l, 1l, 2l, 2l, 2l, 2l, 2l, 3l, 3l, 3l, 3l, 3l, 4l, 4l, 4l, 4l, 4l, 5l, 5l, 5l, 5l, 5l, 7l, 7l, 7l, 7l, 7l, 8l, 8l, 8l, 8l, 8l, 6l, 9l, 10l, 10l, 10l, 11l, 12l, 12l, 12l, 13l, 14l, 14l, 14l, 14l, 15l, 16l, 16l, 16l, 16l, 16l, 17l, 17l, 17l, 17l, 17l, 18l, 18l, 18l, 18l, 18l, 19l, 19l, 20l, 20l, 21l, 22l, 22l, 22l, 22l, 22l, 23l, 23l, 23l, 23l, 23l, 24l, 24l, 24l, 24l, 24l, 25l, 25l, 25l, 25l, 25l, 1l, 2l, 2l, 2l, 2l, 4l, 5l, 5l, 5l, 5l, 6l, 6l, 6l, 6l, 6l, 7l, 7l, 7l, 7l, 8l, 9l, 10l, 10l, 10l, 10l, 3l, 11l, 11l, 11l, 11l, 12l, 13l, 13l, 13l, 13l, 14l, 14l, 14l, 14l, 14l, 15l, 15l, 15l, 15l, 15l, 12l, 16l, 16l, 16l, 16l, 17l, 17l, 17l, 17l, 17l, 18l, 18l, 18l, 18l, 18l, 19l, 19l, 19l, 19l, 19l, 20l, 20l, 20l, 20l, 21l, 22l, 22l, 22l, 23l, 25l, 24l, 24l, 24l, 24l, 24l, 26l, 26l, 26l, 26l, 26l, 27l, 27l, 27l, 27l, 27l, 1l, 2l, 2l, 2l, 2l, 3l, 3l, 3l, 3l, 3l, 4l, 4l, 4l, 4l, 4l, 5l, 5l, 5l, 5l, 5l, 6l, 6l, 6l, 6l, 6l, 7l, 7l, 7l, 7l, 7l, 8l, 9l, 9l, 9l, 9l, 10l, 10l, 10l, 10l, 11l, 12l, 12l, 13l, 14l, 15l, 16l, 17l, 17l, 18l, 18l, 19l, 19l, 19l, 19l, 20l, 21l, 21l, 21l, 21l, 21l, 22l, 22l, 22l, 22l, 22l, 23l, 23l, 23l, 23l, 23l, 24l, 24l, 24l, 24l, 24l, 25l, 25l, 25l, 25l, 25l, 26l, 26l, 26l, 27l, 27l, 28l, 28l, 28l, 28l, 28l, 1l, 1l, 1l, 1l, 1l, 2l, 2l, 2l, 2l, 2l, 3l, 3l, 3l, 3l, 3l, 4l, 4l, 4l, 4l, 5l, 6l, 7l, 7l, 7l, 7l, 8l, 8l, 8l, 8l, 8l), var2 = c(1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 2l, 2l, 2l, 2l, 2l, 2l, 2l, 2l, 2l, 2l, 2l, 2l, 2l, 2l, 2l, 2l, 2l, 2l, 2l, 2l, 2l, 2l, 2l, 2l, 2l, 2l, 2l, 2l, 2l, 2l, 2l, 2l, 2l, 2l, 2l, 2l, 2l, 2l, 2l, 2l, 2l, 2l, 2l, 2l, 2l, 2l, 2l, 2l, 2l, 2l, 2l, 2l, 2l, 2l, 2l, 2l, 2l, 2l, 2l, 2l, 2l, 2l, 2l, 2l, 2l, 2l, 2l, 2l, 2l, 2l, 2l, 2l, 2l, 2l, 2l, 2l, 2l, 2l, 2l, 2l, 2l, 2l, 2l, 2l, 2l, 2l, 2l, 2l, 2l, 2l, 3l, 3l, 3l, 3l, 3l, 3l, 3l, 3l, 3l, 3l, 3l, 3l, 3l, 3l, 3l, 3l, 3l, 3l, 3l, 3l, 3l, 3l, 3l, 3l, 3l, 3l, 3l, 3l, 3l, 3l, 3l, 3l, 3l, 3l, 3l, 3l, 3l, 3l, 3l, 3l, 3l, 3l, 3l, 3l, 3l, 3l, 3l, 3l, 3l, 3l, 3l, 3l, 3l, 3l, 3l, 3l, 3l, 3l, 3l, 3l, 3l, 3l, 3l, 3l, 3l, 3l, 3l, 3l, 3l, 3l, 3l, 3l, 3l, 3l, 3l, 3l, 3l, 3l, 3l, 3l, 3l, 3l, 3l, 3l, 3l, 3l, 3l, 3l, 3l, 3l, 4l, 4l, 4l, 4l, 4l, 4l, 4l, 4l, 4l, 4l, 4l, 4l, 4l, 4l, 4l, 4l, 4l, 4l, 4l, 4l, 4l, 4l, 4l, 4l, 4l, 4l, 4l, 4l, 4l, 4l), ranks = c(1l, 1l, 1l, 1l, 1l, 12l, 12l, 12l, 12l, 12l, 19l, 19l, 19l, 19l, 19l, 20l, 20l, 20l, 20l, 20l, 21l, 21l, 21l, 21l, 21l, 23l, 23l, 23l, 23l, 23l, 24l, 24l, 24l, 24l, 24l, 22l, 25l, 2l, 2l, 2l, 3l, 4l, 4l, 4l, 5l, 6l, 6l, 6l, 6l, 7l, 8l, 8l, 8l, 8l, 8l, 9l, 9l, 9l, 9l, 9l, 10l, 10l, 10l, 10l, 10l, 11l, 11l, 13l, 13l, 14l, 15l, 15l, 15l, 15l, 15l, 16l, 16l, 16l, 16l, 16l, 17l, 17l, 17l, 17l, 17l, 18l, 18l, 18l, 18l, 18l, 26l, 37l, 37l, 37l, 37l, 47l, 48l, 48l, 48l, 48l, 49l, 49l, 49l, 49l, 49l, 50l, 50l, 50l, 50l, 51l, 52l, 27l, 27l, 27l, 27l, 46l, 28l, 28l, 28l, 28l, 29l, 30l, 30l, 30l, 30l, 31l, 31l, 31l, 31l, 31l, 32l, 32l, 32l, 32l, 32l, 29l, 33l, 33l, 33l, 33l, 34l, 34l, 34l, 34l, 34l, 35l, 35l, 35l, 35l, 35l, 36l, 36l, 36l, 36l, 36l, 38l, 38l, 38l, 38l, 39l, 40l, 40l, 40l, 41l, 43l, 42l, 42l, 42l, 42l, 42l, 44l, 44l, 44l, 44l, 44l, 45l, 45l, 45l, 45l, 45l, 53l, 64l, 64l, 64l, 64l, 74l, 74l, 74l, 74l, 74l, 75l, 75l, 75l, 75l, 75l, 76l, 76l, 76l, 76l, 76l, 77l, 77l, 77l, 77l, 77l, 78l, 78l, 78l, 78l, 78l, 79l, 80l, 80l, 80l, 80l, 54l, 54l, 54l, 54l, 55l, 56l, 56l, 57l, 58l, 59l, 60l, 61l, 61l, 62l, 62l, 63l, 63l, 63l, 63l, 65l, 66l, 66l, 66l, 66l, 66l, 67l, 67l, 67l, 67l, 67l, 68l, 68l, 68l, 68l, 68l, 69l, 69l, 69l, 69l, 69l, 70l, 70l, 70l, 70l, 70l, 71l, 71l, 71l, 72l, 72l, 73l, 73l, 73l, 73l, 73l, 81l, 81l, 81l, 81l, 81l, 82l, 82l, 82l, 82l, 82l, 83l, 83l, 83l, 83l, 83l, 84l, 84l, 84l, 84l, 85l, 86l, 87l, 87l, 87l, 87l, 88l, 88l, 88l, 88l, 88l)), .names = c("var1", "var2", "ranks"), row.names = c(na, -300l), class = "data.frame")
the ranks variable got suggested answer:
df1$ranks1 <- dense_rank(paste(df1$var2, df1$var1))
solutions involving paste[0]
work if values within each vector integer fixed number of digits. because paste
converts character and:
- character (lexicographic) ordering differs numeric:
rank(c(1 , 2, 11)); rank(as.character(c(1 , 2, 11)))
- concatenation introduces ambiguities:
paste0(2,12); paste0(21,2)
peter dalgaard made relevant post in 2011 http://r.789695.n4.nabble.com/function-rank-for-data-frames-or-multiple-vectors-td3765685.html
for ignore identical rows. note rank(x) == order(order(x)) , order
accepts multiple ordering columns if don't mind how identical rows split order(order(df$var2, df$var1))
job.
this splits identical rows according original ordering. there number of ways of ranking identical rows http://en.wikipedia.org/wiki/ranking#strategies_for_assigning_rankings.
in 2011 peter dalgaard suggested ave(order(order(df$var2, df$var1)), df$var2, df$var1)
gives wikipedia calls 'fractional ranking' , in base::rank
default ties.method="average"
.
your example wikipedia calls 'dense ranking' isn't available in base::rank
- commented david arenburg - provided dplyr::dense_rank
, can library(dyplr)
, use: dense_rank(ave(order(order(df$var2, df$var1)), df$var2, df$var1))
looking @ code dense_rank
function (x) { r <- rank(x) match(r, sort(unique(r))) }
suggesting if don't want load dplyr , happy 2-statement solution creating variable - e.g. r
- use
r <- ave(order(order(df$var2, df$var1)), df$var2, df$var1); match(r, sort(unique(r)))
edited add ...
you can make things little neater realising data frame list rank columns in order of occurrence:
dense_rank(ave(order(do.call(order, df)), df))
you ranking columns in reverse order of occurrence so
dense_rank(ave(order(do.call(order, rev(df))), df))
or explicitly specifying columns , order
dense_rank(ave(order(do.call(order, df[,2:1])), df[,2:1]))
Comments
Post a Comment