I have a dataframe, which is sorted in descending order by the abc
column (and needs to remain that way). So, beginning with the first row (i.e., where ID
is 1259 and value
is 4054), I want to keep the next row that has a lower value in the value
column, which would be row 2 (i.e., value
is 3902). Then, based on that value in row 2 (3902), I would then want to select the next lowest value row (i.e., 3801
in row 4). Then, based on that value in row 4 (3801), I would select the next lowest value row (i.e., rows 5-7 with a value
of 1314), and so on. I also want to retain rows that are selected that have duplicate values (e.g., 1314
value for rows 5-7).
Data
# A tibble: 42 × 3
ID abc value
<int> <dbl> <dbl>
1 1259 4530 4054.
2 1617 4415 3902.
3 2441 4250 5286.
4 539 4161 3801.
5 1471 4012 1314.
6 687 3934 1314.
7 2436 3900 1314.
8 1295 3591 3558.
9 1738 3493 1188.
10 1747 3366 3902.
# … with 32 more rows
df <- structure(list(ID = c(1259L, 1617L, 2441L, 539L, 1471L, 687L,
2436L, 1295L, 1738L, 1747L, 672L, 2187L, 2214L, 738L, 1784L,
726L, 1712L, 2961L, 137L, 1188L, 1824L, 2856L, 953L, 2579L, 2010L,
484L, 2769L, 1736L, 2994L, 1572L, 586L, 2462L, 2180L, 1404L,
1472L, 1007L, 3000L, 545L, 2098L, 2029L, 2557L, 1623L), abc = c(4530,
4415, 4250, 4161, 4012, 3934, 3900, 3591, 3493, 3366, 3248, 3172,
3172, 3004, 3004, 2939, 2853, 2782, 2541, 2439, 2371, 2116, 2005,
1999, 1838, 1749, 1611, 1573, 1416, 1301, 1177, 1074, 975, 805,
729, 680, 538, 494, 309, 297, 155, 0), value = c(4053.69672793727,
3902.10003271411, 5286.47420784735, 3801.25160651053, 1314.10007572174,
1314.10007572174, 1314.10007572174, 3557.56126600696, 1188.20000553131,
3902.10003271411, 1543.90001773834, 1543.90001773834, 1543.90001773834,
1228.10004138947, 1228.10004138947, 1618.80007553101, 1618.80007553101,
2787.70004177094, 2877.32907307533, 1289.59995269775, 2640.10003471375,
1618.80007553101, 3557.56126600696, 3902.10003271411, 1228.10004138947,
1329.40004730225, 1228.10004138947, 3557.56126600696, 1618.80007553101,
1206.79998779297, 2117.7806730578, 1206.79998779297, 1206.79998779297,
3260.02910244849, 1191.90001773834, 1191.90001773834, 1029.70001125336,
1336.000041008, 1091.09995365143, 1234.89997673035, 1164.4999704361,
1191.90001773834)), row.names = c(NA, -42L), class = c("tbl_df",
"tbl", "data.frame"))
Expected Output
ID abc value
<int> <dbl> <dbl>
1 1259 4530 4054.
2 1617 4415 3902.
3 539 4161 3801.
4 1471 4012 1314.
5 687 3934 1314.
6 2436 3900 1314.
7 1738 3493 1188.
8 3000 538 1030.
I know I could do something like this answer to find the next lowest value for each row, but I am unsure how to go through and filter
. Since this can skip several rows, using lag
and lead
did not work for me. I would like to be able to do this with tidyverse
or data.table
or the apply
family. I would like to avoid using a for loop, unless that's the only way. I have a larger dataset, so I am looking for an efficient way to do this.
CodePudding user response:
library(dplyr)
filter(df, value == cummin(value))
# A tibble: 8 × 3
ID abc value
<int> <dbl> <dbl>
1 1259 4530 4054.
2 1617 4415 3902.
3 539 4161 3801.
4 1471 4012 1314.
5 687 3934 1314.
6 2436 3900 1314.
7 1738 3493 1188.
8 3000 538 1030.