Home > Net >  R data frame manipulation pointers, conditional data manipulation
R data frame manipulation pointers, conditional data manipulation

Time:01-21

I ran into one of those "simple" data operations that seem needlessly annoying in R again. I have a large dataset and want to remove rows in a data frame based on the values of two columns.

What I need is to start dropping rows when y=z, and then stop dropping lines when the value of z changes. The number of lines to be removed varies and I need to repeat this operation for all rows in the data frame.

Original structure,Ideally result

I realize there are probably a million similar threads out there already, but I've already wasted enough time on trying to dig though basic tutorials. I would also be interested in general tips on packages that make data frame manipulations like this simpler in R, I use stuff like mutate and tidyr but is there anything that actually makes these operations less annoying?

Thanks

CodePudding user response:

How about a tidyverse approach with filter? Here is test data where two variable are not equal to 4 at the same time:

library(tidyverse)
mtcars %>% filter(!(gear == 4 & carb == 4))
                     mpg cyl  disp  hp drat    wt  qsec vs am gear carb
Datsun 710          22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
Hornet 4 Drive      21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
Hornet Sportabout   18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
Valiant             18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
Duster 360          14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
Merc 240D           24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2

Second thought, filter two variables that are equal:

mtcars %>% filter(!(vs == am))

                mpg cyl  disp  hp drat    wt  qsec vs am gear carb
Mazda RX4      21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
Mazda RX4 Wag  21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
Hornet 4 Drive 21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
Valiant        18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1

Thanks to comments, after reading more careful I understand your problem like this, deleting rows 1:7

data(mtcars)
df <- mtcars[11:20,c("gear","carb")]
as_tibble(df)

# A tibble: 10 × 2
    gear  carb
   <dbl> <dbl>
 1     4     4
 2     3     3
 3     3     3
 4     3     3
 5     3     4
 6     3     4
 7     3     4
 8     4     1
 9     4     2
10     4     1

CodePudding user response:

I found a dumb solution, using na.locf(zoo package) or ave(base) I created one ascending and one descending column with matching values in the rows that needed exclusion.

Thank you for your input, it seems data manipulation in R is just the worst

  • Related