I have a column of Arsenic values from well water testing and in that column the first value is from before a rain event and the second value is from after a rain event. I need to make a column that subtracts the pre-rain event value from the post-rain event value. Here is a simplified version of what I am working with and what I want to create:
Well | Program | Arsenic | arsenic_change |
---|---|---|---|
A | Monthly | 3 | |
A | Rain Event | 4 | 1 |
B | Monthly | 2 | |
B | Rain Event | 6 | 4 |
C | Monthly | 1 | |
C | Rain Event | 1 | 0 |
How do I make that arsenic_change column? Right now, I have tried to mutate using this code: mutate(arsenic_change = As - lag(As)) This subtracts the previous value from every row though, and I just need the monthly values subtracted from the Rain event values for each well.
CodePudding user response:
Looks like you were pretty close. Assuming your data is ordered in the correct way, you would just need to add a group_by
for the Well so dplyr
knows to only look at each Well separately rather than all rows in the data.
library(tidyverse)
df <- tibble(
"Well" = rep(c("A", "B", "C") , each = 2),
"Program" = rep(c("Monthly", "Rain Event") , 3),
"Arsenic" = c(3, 4, 2, 6, 1, 1)
)
df <-
df %>%
group_by(Well) %>%
mutate(arsenic_change = Arsenic - lag(Arsenic))
df
#> # A tibble: 6 x 4
#> # Groups: Well [3]
#> Well Program Arsenic arsenic_change
#> <chr> <chr> <dbl> <dbl>
#> 1 A Monthly 3 NA
#> 2 A Rain Event 4 1
#> 3 B Monthly 2 NA
#> 4 B Rain Event 6 4
#> 5 C Monthly 1 NA
#> 6 C Rain Event 1 0
If it’s not ordered, because the program looks to be alphabetical order you could arrange
it as well to make sure everything is properly ordered for lag
.
df <- tibble(
"Well" = rep(c("A", "B", "C") , each = 2),
"Program" = rep(c("Rain Event", "Monthly") , 3),
"Arsenic" = c(4, 3, 6, 2, 1, 1)
)
df
#> # A tibble: 6 x 3
#> Well Program Arsenic
#> <chr> <chr> <dbl>
#> 1 A Rain Event 4
#> 2 A Monthly 3
#> 3 B Rain Event 6
#> 4 B Monthly 2
#> 5 C Rain Event 1
#> 6 C Monthly 1
df <-
df %>%
arrange(Well, Program) %>%
group_by(Well) %>%
mutate(arsenic_change = Arsenic - lag(Arsenic))
df
#> # A tibble: 6 x 4
#> # Groups: Well [3]
#> Well Program Arsenic arsenic_change
#> <chr> <chr> <dbl> <dbl>
#> 1 A Monthly 3 NA
#> 2 A Rain Event 4 1
#> 3 B Monthly 2 NA
#> 4 B Rain Event 6 4
#> 5 C Monthly 1 NA
#> 6 C Rain Event 1 0
You can also use base R with the ave
function and add any number of grouping variables.
df$arsenic_change <- ave(df$Arsenic, df$Well, FUN=function(x) c(NA, diff(x)))
e.g. with month (assuming it's ordered correctly as noted above)
df$arsenic_change <- ave(df$Arsenic, df$Well, df$Month, FUN=function(x) c(NA, diff(x)))