I have a grouped dataset. I have my data grouped by GaugeID. I have an nls function that I want to loop over each group and provide an output value.
library(tidyverse)
library(stats)
# sample of data (yearly), first column is gauge (grouping variable), year, then two formula inputs PETvP and ETvP
# A tibble: 10 x 4
GaugeID WATERYR PETvP ETvP
<chr> <dbl> <dbl> <dbl>
1 06892000 1981 0.854 0.754
2 06892000 1982 0.798 0.708
3 06892000 1983 1.12 0.856
4 06892000 1984 0.905 0.720
5 06892000 1985 0.721 0.618
6 06892000 1986 0.717 0.625
7 06892000 1987 0.930 0.783
8 06892000 1988 1.57 0.945
9 06892000 1989 1.15 0.739
10 06892000 1990 0.933 0.805
11 08171300 1981 0.854 0.754
12 08171300 1982 0.798 0.708
13 08171300 1983 1.12 0.856
14 08171300 1984 0.905 0.720
15 08171300 1985 0.721 0.618
16 08171300 1986 0.717 0.625
17 08171300 1987 0.930 0.783
18 08171300 1988 1.57 0.945
19 08171300 1989 1.15 0.739
20 08171300 1990 0.933 0.805
# attempted for loop
for (i in unique(yearly$GaugeID)) {
myValue = nls(ETvP[i] ~ I(1 PETvP[i] - (1 PETvP[i]^(w))^(1/w)), data = yearly,
start = list(w = 2), trace = TRUE)
}
I get the following error
Error in model.frame.default(formula = ~ETvP i PETvP, data = yearly) :
variable lengths differ (found for 'i')
I haven't found much information regarding looping with the nls function. Essentially, I am producing curves and need the value of the curve (w) to output for each gauge. It works if I assign the formula to just one gauge (if I subset the data, i.e for the first gauge), but not when I try to use it on the entire data frame with grouped data. For example, this works
# gaugeA
# A tibble: 10 x 4
GaugeID WATERYR PETvP ETvP
<chr> <dbl> <dbl> <dbl>
1 06892000 1981 0.854 0.754
2 06892000 1982 0.798 0.708
3 06892000 1983 1.12 0.856
4 06892000 1984 0.905 0.720
5 06892000 1985 0.721 0.618
6 06892000 1986 0.717 0.625
7 06892000 1987 0.930 0.783
8 06892000 1988 1.57 0.945
9 06892000 1989 1.15 0.739
10 06892000 1990 0.933 0.805
test = nls(ETvP ~ I(1 PETvP - (1 PETvP^(w))^(1/w)), data = gaugeA,
start = list(w = 2), trace = TRUE)
1.574756 (4.26e 00): par = (2)
0.2649549 (1.46e 00): par = (2.875457)
0.09466832 (3.32e-01): par = (3.59986)
0.08543699 (2.53e-02): par = (3.881397)
0.08538308 (9.49e-05): par = (3.907099)
0.08538308 (1.13e-06): par = (3.907001)
> test
Nonlinear regression model
model: ETvP ~ I(1 PETvP - (1 PETvP^(w))^(1/w))
data: gaugeA
w
3.907
residual sum-of-squares: 0.08538
Number of iterations to convergence: 5
Achieved convergence tolerance: 1.128e-06
Any ideas on how I can get the subset results for my entire grouped dataframe? It has over 600 different gauges in it. Thank you in advance.
CodePudding user response:
Any of the following will work:
Using summarise
:
df %>%
group_by(GaugeID) %>%
summarise(result = list(nls(ETvP ~ I(1 PETvP - (1 PETvP^(w))^(1/w)),
data = cur_data(),
start = list(w = 2)))) %>%
pull(result)
[[1]]
Nonlinear regression model
model: ETvP ~ I(1 PETvP - (1 PETvP^(w))^(1/w))
data: cur_data()
w
3.607
residual sum-of-squares: 0.01694
Number of iterations to convergence: 5
Achieved convergence tolerance: 7.11e-08
[[2]]
Nonlinear regression model
model: ETvP ~ I(1 PETvP - (1 PETvP^(w))^(1/w))
data: cur_data()
w
1.086
residual sum-of-squares: 0.1532
Number of iterations to convergence: 5
Achieved convergence tolerance: 2.685e-07
Using map
:
df %>%
group_split(GaugeID) %>%
map(~nls(ETvP ~ I(1 PETvP - (1 PETvP^(w))^(1/w)),
data = .x,
start = list(w = 2)))
CodePudding user response:
I usally prefer purrr
and dplyr
for looping functions on grouped data.
I cant edit the data, but maybe this works:
library(dplyr)
library(purrr)
yearly %>% group_by(GaugeID) %>% summarise(test = nls(ETvP ~ I(1 PETvP - (1 PETvP^(w))^(1/w)), data = gaugeA, start = list(w = 2), trace = TRUE)