Removing Only Adjacent Duplicates in Data Frame in R -

- January 15, 2014

i have data frame in r is supposed have duplicates. however, there duplicates need remove. in particular, want remove row-adjacent duplicates, keep rest. example, suppose had data frame:

df = data.frame(x = c("a", "b", "c", "a", "b", "c", "a", "b", "b", "c"),                  y = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10))

this results in following data frame

x   y   1 b   2 c   3   4 b   5 c   6   7 b   8 b   9 c   10

in case, expect there repeating "a, b, c, a, b, c, etc.". however, problem if see adjacent row duplicates. in example above, rows 8 , 9 duplicate "b" being adjacent each other.

in data set, whenever occurs, first instance user-error, , second correct version. in rare cases, there might instance duplicates occur 3 (or more) times. however, in every case, want keep last occurrence. thus, following example above, final data set like

a   1 b   2 c   3   4 b   5 c   6   7 b   9 c   10

is there easy way in r? thank in advance help!

edit: 11/19/2014 12:14 pm est there solution posted user akron (spelling?) has since gotten deleted. sure why because seemed work me?

the solution was

df = df[with(df, c(x[-1]!= x[-nrow(df)], true)),]

it seems work me, why did deleted? example, in cases more 2 consecutive duplicates:

df = data.frame(x = c("a", "b", "b", "b", "c", "c", "c", "a", "b", "c", "a", "b", "b", "c"), y = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14))    x  y 1   1 2  b  2 3  b  3 4  b  4 5  c  5 6  c  6 7  c  7 8   8 9  b  9 10 c 10 11 11 12 b 12 13 b 13 14 c 14  > df = df[with(df, c(x[-1]!= x[-nrow(df)], true)),] > df    x  y 1   1 4  b  4 7  c  7 8   8 9  b  9 10 c 10 11 11 13 b 13 14 c 14

this seems work?

try

 df[with(df, c(x[-1]!= x[-nrow(df)], true)),] #   x  y #1   1 #2  b  2 #3  c  3 #4   4 #5  b  5 #6  c  6 #7   7 #9  b  9 #10 c 10

explanation

here, comparing element element preceding it. can done removing first element column , column compared column last element removed (so lengths become equal)

 df$x[-1] #first element removed  #[1] b c b c b b c  df$x[-nrow(df)]   #[1] b c b c b b #last element `c` removed   df$x[-1]!=df$x[-nrow(df)]  #[1]  true  true  true  true  true  true  true false  true

in above, length 1 less nrow of df removed 1 element. inorder compensate that, can concatenate true , use index subsetting dataset.

Search This Blog

Hide

Removing Only Adjacent Duplicates in Data Frame in R -

explanation

Comments

Post a Comment

Popular posts from this blog

java - Oracle EBS .ClassNotFoundException: oracle.apps.fnd.formsClient.FormsLauncher.class ERROR -

c# - how to use buttonedit in devexpress gridcontrol -

How do you convert a timestamp into a datetime in python with the correct timezone? -