Function for finding Temperature at Dissolved Oxygen of 3 (TDO3) value across a whole year-CodePudding

I am looking to calculate the TDO3 value at every date during the year 2020. I have interpolated data sets of both temperature and dissolved oxygen in 0.25 meter increments from 1m - 22m below the surface between the dates of Jan-1-2020 and Dec-31-2020.

TDO3 is the temperature when dissolved oxygen is 3mg/L. Below are snips of the merged data set.

    > print(do_temp, n=85)
# A tibble: 31,110 x 4
   date       depth mean_temp mean_do
   <date>     <dbl>     <dbl>   <dbl>
 1 2020-01-01  1         2.12  11.6  
 2 2020-01-01  1.25      2.19  11.5  
 3 2020-01-01  1.5       2.27  11.4  
 4 2020-01-01  1.75      2.34  11.3  
 5 2020-01-01  2         2.42  11.2  
 6 2020-01-01  2.25      2.40  11.2  
 7 2020-01-01  2.5       2.39  11.1  
 8 2020-01-01  2.75      2.38  11.1  
 9 2020-01-01  3         2.37  11.0  
10 2020-01-01  3.25      2.41  11.0  
11 2020-01-01  3.5       2.46  11.0  
12 2020-01-01  3.75      2.50  10.9  
13 2020-01-01  4         2.55  10.9  
14 2020-01-01  4.25      2.54  10.9  
15 2020-01-01  4.5       2.53  10.9  
16 2020-01-01  4.75      2.52  11.0  
17 2020-01-01  5         2.51  11.0  
18 2020-01-01  5.25      2.50  11.0  
19 2020-01-01  5.5       2.49  11.0  
20 2020-01-01  5.75      2.49  11.1  
21 2020-01-01  6         2.48  11.1  
22 2020-01-01  6.25      2.49  10.9  
23 2020-01-01  6.5       2.51  10.8  
24 2020-01-01  6.75      2.52  10.7  
25 2020-01-01  7         2.54  10.5  
26 2020-01-01  7.25      2.55  10.4  
27 2020-01-01  7.5       2.57  10.2  
28 2020-01-01  7.75      2.58  10.1  
29 2020-01-01  8         2.60   9.95 
30 2020-01-01  8.25      2.63  10.1  
31 2020-01-01  8.5       2.65  10.2  
32 2020-01-01  8.75      2.68  10.3  
33 2020-01-01  9         2.71  10.5  
34 2020-01-01  9.25      2.69  10.6  
35 2020-01-01  9.5       2.67  10.7  
36 2020-01-01  9.75      2.65  10.9  
37 2020-01-01 10         2.63  11.0  
38 2020-01-01 10.2       2.65  10.8  
39 2020-01-01 10.5       2.67  10.6  
40 2020-01-01 10.8       2.69  10.3  
41 2020-01-01 11         2.72  10.1  
42 2020-01-01 11.2       2.75   9.89 
43 2020-01-01 11.5       2.78   9.67 
44 2020-01-01 11.8       2.81   9.44 
45 2020-01-01 12         2.84   9.22 
46 2020-01-01 12.2       2.83   9.39 
47 2020-01-01 12.5       2.81   9.56 
48 2020-01-01 12.8       2.80   9.74 
49 2020-01-01 13         2.79   9.91 
50 2020-01-01 13.2       2.80  10.1  
51 2020-01-01 13.5       2.81  10.3  
52 2020-01-01 13.8       2.82  10.4  
53 2020-01-01 14         2.83  10.6  
54 2020-01-01 14.2       2.86  10.5  
55 2020-01-01 14.5       2.88  10.4  
56 2020-01-01 14.8       2.91  10.2  
57 2020-01-01 15         2.94  10.1  
58 2020-01-01 15.2       2.95  10.0  
59 2020-01-01 15.5       2.96   9.88 
60 2020-01-01 15.8       2.97   9.76 
61 2020-01-01 16         2.98   9.65 
62 2020-01-01 16.2       2.99   9.53 
63 2020-01-01 16.5       3.00   9.41 
64 2020-01-01 16.8       3.01   9.30 
65 2020-01-01 17         3.03   9.18 
66 2020-01-01 17.2       3.05   9.06 
67 2020-01-01 17.5       3.07   8.95 
68 2020-01-01 17.8       3.09   8.83 
69 2020-01-01 18         3.11   8.71 
70 2020-01-01 18.2       3.13   8.47 
71 2020-01-01 18.5       3.14   8.23 
72 2020-01-01 18.8       3.16   7.98 
73 2020-01-01 19         3.18   7.74 
74 2020-01-01 19.2       3.18   7.50 
75 2020-01-01 19.5       3.18   7.25 
76 2020-01-01 19.8       3.18   7.01 
77 2020-01-01 20         3.18   6.77 
78 2020-01-01 20.2       3.18   5.94 
79 2020-01-01 20.5       3.18   5.10 
80 2020-01-01 20.8       3.18   4.27 
81 2020-01-01 21         3.18   3.43 
82 2020-01-01 21.2       3.22   2.60 
83 2020-01-01 21.5       3.25   1.77 
84 2020-01-01 21.8       3.29   0.934
85 2020-01-01 22         3.32   0.100
# ... with 31,025 more rows

https://github.com/TRobin82/WaterQuality

The above link will get you to the raw data.

What I am looking for is a data frame that looks like this but it will have 366 rows for each date during the year.

> TDO3
       dates      tdo3
1   2020-1-1  3.183500
2   2020-2-1  3.341188
3   2020-3-1  3.338625
4   2020-4-1  3.437000
5   2020-5-1  4.453310
6   2020-6-1  5.887560
7   2020-7-1  6.673700
8   2020-8-1  7.825672
9   2020-9-1  8.861190
10 2020-10-1 11.007972
11 2020-11-1  7.136880
12 2020-12-1  2.752500

However a DO value of a perfect 3 mg/L is not found in the interpolation data frame of DO so I would need the function to find the closest value to 3 without going below then match the depth of that value up with the other data frame for temperature to assign the proper temperature at that depth.

I am assuming the best route to take is a for-loop but not sold on the proper way to go about this question.

CodePudding user response：

here's one way of doing it with tidyverse-style functions. Note that this code is reproducible because anyone can run it and should get the same answer. It's great that you showed us your data, but it's even better to post the output of dput() because then people can load the data and start helping you immediately.

This code does the following:

Load the data from the link you provided. But since there were several data files I had to guess which one you meant.
Groups the observations by date.
Puts the observations in increasing order of mean_do.
Removes rows with values of mean_do that are strictly less than 3.
Takes the first ordered observation for each date (this will be the one with the lowest value of mean_do that is greater than or equal to 3).
Rename the column mean_temp as tdo3 since it's the temperature for that date when the dissolved oxygen level was closest to 3mg/L.

library(tidyverse)
do_temp <- read_csv("https://raw.githubusercontent.com/TRobin82/WaterQuality/main/DateDepthTempDo.csv") %>%
  select(-X1)

do_temp %>%
  group_by(date) %>%
  arrange(mean_do) %>%
  filter(mean_do > 3) %>%
  slice_head(n=1) %>%
  rename(tdo3 = mean_temp) %>%
  select(date, tdo3)

Here are the results. They're a bit different from the ones you posted, so I'm not sure if I've misunderstood you or if those were just illustrative and not real results.

# A tibble: 366 x 2
# Groups:   date [366]
   date        tdo3
   <date>     <dbl>
 1 2020-01-01  3.18
 2 2020-01-02  3.18
 3 2020-01-03  3.19
 4 2020-01-04  3.21
 5 2020-01-05  3.21
 6 2020-01-06  3.21
 7 2020-01-07  3.24
 8 2020-01-08  3.28
 9 2020-01-09  3.27
10 2020-01-10  3.28
# ... with 356 more rows

Let me know if you were looking for something else.