Home > front end >  Remove not increasing rows based on other columns values
Remove not increasing rows based on other columns values

Time:05-06

I have a data frame on R and I want to remove all rows that are not increasing in my column 3. Each row have to be higher or equal than the previous one. But my main difficulty is that the rows have to increase according other columns 1 and 2. In my example, Column3 have to increase according Column1 [A-B] and 2 [1:4]. Here, Column1 [B] have to be removed because 199>197.

PS : It is CO2 measurements corresponding to many plots and date. When the CO2 measurments is not monotonous in the time, the measurement is wrong.

Column1 Column2 Column3
A 1 200
A 2 202
A 3 204
A 4 207
B 1 199
B 2 197
B 3 200
B 4 202

CodePudding user response:

There may be an easier way to go about it, but here is an approach:

If you want just the observation that violates the condition removed (here, the observation with value 197 in it), try this:

df %>% group_by(Column1) %>% 
  mutate(del = (lag(Column3) > Column3)) %>% 
  filter(!del|is.na(del)) %>% 
  select(-del)

Output:

# Column1 Column2 Column3
# <chr>     <int>   <int>
# 1 A             1     200
# 2 A             2     202
# 3 A             3     204
# 4 A             4     207
# 5 B             1     199
# 6 B             3     200
# 7 B             4     202

If you want to remove all the observations from a given group where the condition is not met (here, group b)

df %>% group_by(Column1) %>% 
  mutate(del = any((lag(Column3) > Column3))) %>% 
  filter(!del|is.na(del)) %>% 
  select(-del)

Output:

# Column1 Column2 Column3
# <chr>     <int>   <int>
# 1 A             1     200
# 2 A             2     202
# 3 A             3     204
# 4 A             4     207

Data used in this example:

df <- read.table(text = "Column1    Column2 Column3
A   1   200
A   2   202
A   3   204
A   4   207
B   1   199
B   2   197
B   3   200
B   4   202", header = TRUE)

CodePudding user response:

You can use diff() to determine if a group is increasing.

library(dplyr)

df %>%
  group_by(Column1) %>%
  filter(all(diff(Column3) >= 0)) %>%
  ungroup()

# # A tibble: 4 × 3
#   Column1 Column2 Column3
#   <chr>     <int>   <int>
# 1 A             1     200
# 2 A             2     202
# 3 A             3     204
# 4 A             4     207
  • Related