I'm looking at oxygen concentrations in relation to bottom trawling at different depths in inner Danish waters for the last 40 years.
I have a data frame (Oxy) with four columns: ID
, Date
, Depth
and Oxygen
. The Oxygen
has been measured throughout many years (Date
), at many different locations (ID
) and at many different Depths
down the water column, spanning from 0-50 meters.
I would like to create a data frame where I have the Oxygen
for the last 4 meters (Depth
) (from the bottom and 4 meters up in the water column) for each station and the corresponding date. The measurements are not by every whole meter but at varying depths. The Depth
where the Oxygen
has been measured are not the same for each ID
, so for one ID
it has been sampled at 0.2, 0.4, 0.6 meters etc. and for another ID
it has been sampled at 0.67, 1.3, 1.55 meters etc. The Depth
for each ID
also varies, so for one station the deepest measurement is at 30 meters and for another one it's 46 meters.
I have about 4 million rows, so this is just an output of my data:
ID Date Depth Oxygen
------ ---------- ----- ------
957001 2002-01-14 1.20 12.10
967503 2002-01-28 2.00 11.60
957001 2002-01-22 25.00 7.80
965206 2002-01-28 5.40 11.70
953001 2002-01-31 23.60 10.30
941101 2002-01-22 8.67 12.00
940201 2002-01-17 5.00 11.70
965404 2002-01-30 38.80 9.40
952003 2002-01-08 23.40 6.30
957101 2002-01-15 6.00 11.60
I have been searching on google for an answer but can't seem to find the right one. I can extract the highest value or the top 5 highest values by using arrange(), group_by() and slice()
. However, that wouldn't work for my data frame because the measurement intervals vary in depth and it needs to be similar for all ID's
and Dates
.
I imagine that it could be something like; take the highest value and then keep the values that are within -4 from that highest value.
So, I need to end up with all the deepest (last 4 meters for Depth
) measurements for Oxygen
dependent on ID
and Date
.
It would look something like this:
ID Date Depth Oxygen
------ ---------- ----- ------
957001 2002-01-14 30.20 2.10
967503 2002-01-28 28.00 1.60
957001 2002-01-22 29.00 7.80
965206 2002-01-28 30.40 5.70
953001 2002-01-31 23.60 10.30
941101 2002-01-22 28.67 7.00
940201 2002-01-17 30.00 8.70
965404 2002-01-30 38.80 9.40
952003 2002-01-08 23.40 6.30
957101 2002-01-15 46.00 1.60
CodePudding user response:
Just as you said, filter to Depth
greater than max() - 4
within each ID
. Using dplyr:
library(dplyr)
oxy %>%
group_by(ID) %>%
filter(Depth >= max(Depth) - 4) %>%
ungroup()
# A tibble: 9 × 4
ID Date Depth Oxygen
<dbl> <date> <dbl> <dbl>
1 967503 2002-01-28 2 11.6
2 957001 2002-01-22 25 7.8
3 965206 2002-01-28 5.4 11.7
4 953001 2002-01-31 23.6 10.3
5 941101 2002-01-22 8.67 12
6 940201 2002-01-17 5 11.7
7 965404 2002-01-30 38.8 9.4
8 952003 2002-01-08 23.4 6.3
9 957101 2002-01-15 6 11.6