I am looking to calculate the TDO3 value at every date during the year 2020. I have interpolated data sets of both temperature and dissolved oxygen in 0.25 meter increments from 1m - 22m below the surface between the dates of Jan-1-2020 and Dec-31-2020.
TDO3 is the temperature when dissolved oxygen is 3mg/L. Below are snips of the merged data set.
> print(do_temp, n=85)
# A tibble: 31,110 x 4
date depth mean_temp mean_do
<date> <dbl> <dbl> <dbl>
1 2020-01-01 1 2.12 11.6
2 2020-01-01 1.25 2.19 11.5
3 2020-01-01 1.5 2.27 11.4
4 2020-01-01 1.75 2.34 11.3
5 2020-01-01 2 2.42 11.2
6 2020-01-01 2.25 2.40 11.2
7 2020-01-01 2.5 2.39 11.1
8 2020-01-01 2.75 2.38 11.1
9 2020-01-01 3 2.37 11.0
10 2020-01-01 3.25 2.41 11.0
11 2020-01-01 3.5 2.46 11.0
12 2020-01-01 3.75 2.50 10.9
13 2020-01-01 4 2.55 10.9
14 2020-01-01 4.25 2.54 10.9
15 2020-01-01 4.5 2.53 10.9
16 2020-01-01 4.75 2.52 11.0
17 2020-01-01 5 2.51 11.0
18 2020-01-01 5.25 2.50 11.0
19 2020-01-01 5.5 2.49 11.0
20 2020-01-01 5.75 2.49 11.1
21 2020-01-01 6 2.48 11.1
22 2020-01-01 6.25 2.49 10.9
23 2020-01-01 6.5 2.51 10.8
24 2020-01-01 6.75 2.52 10.7
25 2020-01-01 7 2.54 10.5
26 2020-01-01 7.25 2.55 10.4
27 2020-01-01 7.5 2.57 10.2
28 2020-01-01 7.75 2.58 10.1
29 2020-01-01 8 2.60 9.95
30 2020-01-01 8.25 2.63 10.1
31 2020-01-01 8.5 2.65 10.2
32 2020-01-01 8.75 2.68 10.3
33 2020-01-01 9 2.71 10.5
34 2020-01-01 9.25 2.69 10.6
35 2020-01-01 9.5 2.67 10.7
36 2020-01-01 9.75 2.65 10.9
37 2020-01-01 10 2.63 11.0
38 2020-01-01 10.2 2.65 10.8
39 2020-01-01 10.5 2.67 10.6
40 2020-01-01 10.8 2.69 10.3
41 2020-01-01 11 2.72 10.1
42 2020-01-01 11.2 2.75 9.89
43 2020-01-01 11.5 2.78 9.67
44 2020-01-01 11.8 2.81 9.44
45 2020-01-01 12 2.84 9.22
46 2020-01-01 12.2 2.83 9.39
47 2020-01-01 12.5 2.81 9.56
48 2020-01-01 12.8 2.80 9.74
49 2020-01-01 13 2.79 9.91
50 2020-01-01 13.2 2.80 10.1
51 2020-01-01 13.5 2.81 10.3
52 2020-01-01 13.8 2.82 10.4
53 2020-01-01 14 2.83 10.6
54 2020-01-01 14.2 2.86 10.5
55 2020-01-01 14.5 2.88 10.4
56 2020-01-01 14.8 2.91 10.2
57 2020-01-01 15 2.94 10.1
58 2020-01-01 15.2 2.95 10.0
59 2020-01-01 15.5 2.96 9.88
60 2020-01-01 15.8 2.97 9.76
61 2020-01-01 16 2.98 9.65
62 2020-01-01 16.2 2.99 9.53
63 2020-01-01 16.5 3.00 9.41
64 2020-01-01 16.8 3.01 9.30
65 2020-01-01 17 3.03 9.18
66 2020-01-01 17.2 3.05 9.06
67 2020-01-01 17.5 3.07 8.95
68 2020-01-01 17.8 3.09 8.83
69 2020-01-01 18 3.11 8.71
70 2020-01-01 18.2 3.13 8.47
71 2020-01-01 18.5 3.14 8.23
72 2020-01-01 18.8 3.16 7.98
73 2020-01-01 19 3.18 7.74
74 2020-01-01 19.2 3.18 7.50
75 2020-01-01 19.5 3.18 7.25
76 2020-01-01 19.8 3.18 7.01
77 2020-01-01 20 3.18 6.77
78 2020-01-01 20.2 3.18 5.94
79 2020-01-01 20.5 3.18 5.10
80 2020-01-01 20.8 3.18 4.27
81 2020-01-01 21 3.18 3.43
82 2020-01-01 21.2 3.22 2.60
83 2020-01-01 21.5 3.25 1.77
84 2020-01-01 21.8 3.29 0.934
85 2020-01-01 22 3.32 0.100
# ... with 31,025 more rows
https://github.com/TRobin82/WaterQuality
The above link will get you to the raw data.
What I am looking for is a data frame that looks like this but it will have 366 rows for each date during the year.
> TDO3
dates tdo3
1 2020-1-1 3.183500
2 2020-2-1 3.341188
3 2020-3-1 3.338625
4 2020-4-1 3.437000
5 2020-5-1 4.453310
6 2020-6-1 5.887560
7 2020-7-1 6.673700
8 2020-8-1 7.825672
9 2020-9-1 8.861190
10 2020-10-1 11.007972
11 2020-11-1 7.136880
12 2020-12-1 2.752500
However a DO value of a perfect 3 mg/L is not found in the interpolation data frame of DO so I would need the function to find the closest value to 3 without going below then match the depth of that value up with the other data frame for temperature to assign the proper temperature at that depth.
I am assuming the best route to take is a for-loop but not sold on the proper way to go about this question.
CodePudding user response:
here's one way of doing it with tidyverse
-style functions. Note that this code is reproducible because anyone can run it and should get the same answer. It's great that you showed us your data, but it's even better to post the output of dput()
because then people can load the data and start helping you immediately.
This code does the following:
- Load the data from the link you provided. But since there were several data files I had to guess which one you meant.
- Groups the observations by
date
. - Puts the observations in increasing order of
mean_do
. - Removes rows with values of
mean_do
that are strictly less than 3. - Takes the first ordered observation for each date (this will be the one with the lowest value of
mean_do
that is greater than or equal to 3). - Rename the column
mean_temp
astdo3
since it's the temperature for that date when the dissolved oxygen level was closest to 3mg/L.
library(tidyverse)
do_temp <- read_csv("https://raw.githubusercontent.com/TRobin82/WaterQuality/main/DateDepthTempDo.csv") %>%
select(-X1)
do_temp %>%
group_by(date) %>%
arrange(mean_do) %>%
filter(mean_do > 3) %>%
slice_head(n=1) %>%
rename(tdo3 = mean_temp) %>%
select(date, tdo3)
Here are the results. They're a bit different from the ones you posted, so I'm not sure if I've misunderstood you or if those were just illustrative and not real results.
# A tibble: 366 x 2
# Groups: date [366]
date tdo3
<date> <dbl>
1 2020-01-01 3.18
2 2020-01-02 3.18
3 2020-01-03 3.19
4 2020-01-04 3.21
5 2020-01-05 3.21
6 2020-01-06 3.21
7 2020-01-07 3.24
8 2020-01-08 3.28
9 2020-01-09 3.27
10 2020-01-10 3.28
# ... with 356 more rows
Let me know if you were looking for something else.