How can you make a new column that is based on a row where the values in that row are essentially A-CodePudding

I have a column of Arsenic values from well water testing and in that column the first value is from before a rain event and the second value is from after a rain event. I need to make a column that subtracts the pre-rain event value from the post-rain event value. Here is a simplified version of what I am working with and what I want to create:

Well	Program	Arsenic	arsenic_change
A	Monthly	3
A	Rain Event	4	1
B	Monthly	2
B	Rain Event	6	4
C	Monthly	1
C	Rain Event	1	0

How do I make that arsenic_change column? Right now, I have tried to mutate using this code: mutate(arsenic_change = As - lag(As)) This subtracts the previous value from every row though, and I just need the monthly values subtracted from the Rain event values for each well.

CodePudding user response：

Looks like you were pretty close. Assuming your data is ordered in the correct way, you would just need to add a group_by for the Well so dplyr knows to only look at each Well separately rather than all rows in the data.

library(tidyverse)

df <- tibble(
  "Well" = rep(c("A", "B", "C") , each = 2),
  "Program" = rep(c("Monthly", "Rain Event") , 3),
  "Arsenic" = c(3, 4, 2, 6, 1, 1)
)

df <-
  df %>%
  group_by(Well) %>%
  mutate(arsenic_change = Arsenic - lag(Arsenic))

df
#> # A tibble: 6 x 4
#> # Groups:   Well [3]
#>   Well  Program    Arsenic arsenic_change
#>   <chr> <chr>        <dbl>          <dbl>
#> 1 A     Monthly          3             NA
#> 2 A     Rain Event       4              1
#> 3 B     Monthly          2             NA
#> 4 B     Rain Event       6              4
#> 5 C     Monthly          1             NA
#> 6 C     Rain Event       1              0

If it’s not ordered, because the program looks to be alphabetical order you could arrange it as well to make sure everything is properly ordered for lag.

df <- tibble(
  "Well" = rep(c("A", "B", "C") , each = 2),
  "Program" = rep(c("Rain Event", "Monthly") , 3),
  "Arsenic" = c(4, 3, 6, 2, 1, 1)
)

df
#> # A tibble: 6 x 3
#>   Well  Program    Arsenic
#>   <chr> <chr>        <dbl>
#> 1 A     Rain Event       4
#> 2 A     Monthly          3
#> 3 B     Rain Event       6
#> 4 B     Monthly          2
#> 5 C     Rain Event       1
#> 6 C     Monthly          1

df <-
  df %>%
  arrange(Well, Program) %>%
  group_by(Well) %>%
  mutate(arsenic_change = Arsenic - lag(Arsenic))

df
#> # A tibble: 6 x 4
#> # Groups:   Well [3]
#>   Well  Program    Arsenic arsenic_change
#>   <chr> <chr>        <dbl>          <dbl>
#> 1 A     Monthly          3             NA
#> 2 A     Rain Event       4              1
#> 3 B     Monthly          2             NA
#> 4 B     Rain Event       6              4
#> 5 C     Monthly          1             NA
#> 6 C     Rain Event       1              0

You can also use base R with the ave function and add any number of grouping variables.

df$arsenic_change <- ave(df$Arsenic, df$Well, FUN=function(x) c(NA, diff(x)))

e.g. with month (assuming it's ordered correctly as noted above)

df$arsenic_change <- ave(df$Arsenic, df$Well, df$Month, FUN=function(x) c(NA, diff(x)))