Removing Only Adjacent Duplicates in Data Frame in R -
i have data frame in r is supposed have duplicates. however, there duplicates need remove. in particular, want remove row-adjacent duplicates, keep rest. example, suppose had data frame:
df = data.frame(x = c("a", "b", "c", "a", "b", "c", "a", "b", "b", "c"), y = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10))
this results in following data frame
x y 1 b 2 c 3 4 b 5 c 6 7 b 8 b 9 c 10
in case, expect there repeating "a, b, c, a, b, c, etc.". however, problem if see adjacent row duplicates. in example above, rows 8 , 9 duplicate "b" being adjacent each other.
in data set, whenever occurs, first instance user-error, , second correct version. in rare cases, there might instance duplicates occur 3 (or more) times. however, in every case, want keep last occurrence. thus, following example above, final data set like
a 1 b 2 c 3 4 b 5 c 6 7 b 9 c 10
is there easy way in r? thank in advance help!
edit: 11/19/2014 12:14 pm est there solution posted user akron (spelling?) has since gotten deleted. sure why because seemed work me?
the solution was
df = df[with(df, c(x[-1]!= x[-nrow(df)], true)),]
it seems work me, why did deleted? example, in cases more 2 consecutive duplicates:
df = data.frame(x = c("a", "b", "b", "b", "c", "c", "c", "a", "b", "c", "a", "b", "b", "c"), y = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)) x y 1 1 2 b 2 3 b 3 4 b 4 5 c 5 6 c 6 7 c 7 8 8 9 b 9 10 c 10 11 11 12 b 12 13 b 13 14 c 14 > df = df[with(df, c(x[-1]!= x[-nrow(df)], true)),] > df x y 1 1 4 b 4 7 c 7 8 8 9 b 9 10 c 10 11 11 13 b 13 14 c 14
this seems work?
try
df[with(df, c(x[-1]!= x[-nrow(df)], true)),] # x y #1 1 #2 b 2 #3 c 3 #4 4 #5 b 5 #6 c 6 #7 7 #9 b 9 #10 c 10
explanation
here, comparing element element preceding it. can done removing first element
column , column compared column last element
removed (so lengths become equal)
df$x[-1] #first element removed #[1] b c b c b b c df$x[-nrow(df)] #[1] b c b c b b #last element `c` removed df$x[-1]!=df$x[-nrow(df)] #[1] true true true true true true true false true
in above, length 1
less nrow
of df
removed 1 element. inorder compensate that, can concatenate true
, use index
subsetting dataset.
Comments
Post a Comment