Home > Software engineering >  Breaking ties based on repeated counts/Subsetting data in R
Breaking ties based on repeated counts/Subsetting data in R

Time:04-09

I'm trying to come up with a reasonable (if not clever) way of subsetting some data. Assume that when I create a table from the original data, it looks like this:

testdat <- data.frame(nom = c("A", "B", "C", "D", "E", "F", "G", "H", "I", 
"J", "K"), cts = c(100, 50, 35, 10, 10, 5, 4, 2, 1, 1, 1)) 

My idea was to cut the data after the first three points here (they all have unique name/count combinations) and then take points D, E, F, and G as a group (they are the first group with repeated counts), and then points I, J, and K (second group with repeated counts). Just in case it isn't clear what I mean by "repeated counts," I mean that there's no difference between E and F except their name - they both appear 10 times in the data.

This isn't searching for duplicates (since each row is unique), but it is (since there are repeated counts in the second column). We can assume that the order is always either decreasing or repeated; it never increases (the table results were sorted in decreasing order).

How can I find the row (and row number) of the first time cts is repeated n times?

CodePudding user response:

You can get the row containing the first value that repeats more than once by doing:

which(testdat$cts == rle(testdat$cts)$values[which(rle(testdat$cts)$lengths > 1)[1]])[1]
#> [1] 4

And the first entry that repeats three times is

which(testdat$cts == rle(testdat$cts)$values[which(rle(testdat$cts)$lengths > 2)[1]])[1]
#> [1] 9

And all the duplicated rows with

which(duplicated(testdat$cts) | rev(duplicated(rev(testdat$cts))))
#> [1]  4  5  9 10 11
  • Related