I ran into one of those "simple" data operations that seem needlessly annoying in R again. I have a large dataset and want to remove rows in a data frame based on the values of two columns.
What I need is to start dropping rows when y=z, and then stop dropping lines when the value of z changes. The number of lines to be removed varies and I need to repeat this operation for all rows in the data frame.
Original structure,Ideally result
I realize there are probably a million similar threads out there already, but I've already wasted enough time on trying to dig though basic tutorials. I would also be interested in general tips on packages that make data frame manipulations like this simpler in R, I use stuff like mutate and tidyr but is there anything that actually makes these operations less annoying?
Thanks
CodePudding user response:
How about a tidyverse
approach with filter
? Here is test data where two variable are not equal to 4 at the same time:
library(tidyverse)
mtcars %>% filter(!(gear == 4 & carb == 4))
mpg cyl disp hp drat wt qsec vs am gear carb
Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
Second thought, filter two variables that are equal:
mtcars %>% filter(!(vs == am))
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
Thanks to comments, after reading more careful I understand your problem like this, deleting rows 1:7
data(mtcars)
df <- mtcars[11:20,c("gear","carb")]
as_tibble(df)
# A tibble: 10 × 2
gear carb
<dbl> <dbl>
1 4 4
2 3 3
3 3 3
4 3 3
5 3 4
6 3 4
7 3 4
8 4 1
9 4 2
10 4 1
CodePudding user response:
I found a dumb solution, using na.locf(zoo package) or ave(base) I created one ascending and one descending column with matching values in the rows that needed exclusion.
Thank you for your input, it seems data manipulation in R is just the worst