I am working with a large data set that has longitudinal measurements. To simplify what I am working with here is an example. Lets say, a study measured rain fall in specific cities over a period of time. Below is an example data set imported into R. Note some cities do not have as many measurements as others, and the data are somewhat all over the place. The years the data were taken on these cities are not all the same, so we can count them just as observations.
Here is what the data kind of look like in R
City Time.point Total.rain
City1 1 0.50
City1 2 0.70
City1 3 0.60
City1 4 0.40
City1 5 0.60
City1 6 0.20
City2 1 1.00
City2 2 0.80
City2 3 0.50
City2 4 0.80
City3 1 1.00
City3 2 1.20
City3 3 1.20
City4 1 0.30
City4 2 0.20
City4 3 0.30
City4 4 0.50
City4 5 0.10
City4 6 0.01
City4 7 0.02
City5 1 0.10
City5 2 0.15
City5 3 0.30
City5 4 0.30
City5 5 0.25
City5 6 0.30
How would I find the regression, the slope of the best fit line, for each city? I do not want to compare cities. just find these data for each city and create a new data set with a single point for each city. something like the one below (if I did it correctly by Hand).
City Regression.slope
City1 -0.05714286
City2 -0.09000000
City3 0.10000000
City4 -0.05071429
City5 0.03714286
Any help would be very much appreciated.
CodePudding user response:
I think this can be simplified greatly, in a single line of code
tidyverse
dat %>% group_by(City) %>% summarize(est = lm(Total.rain~Time.point)$coef[2])
data.table
dat[, .(est = lm(Total.rain~Time.point)$coef[2]), by=.(City)]
Output:
City est
<char> <num>
1: City1 -0.05714286
2: City2 -0.09000000
3: City3 0.10000000
4: City4 -0.05071429
5: City5 0.03714286
CodePudding user response:
I think this gets you there. Double check calc on City5. :-)
library(dplyr)
library(purrr)
#nest each city into a data frame
df_City <- df %>%
group_by(City) %>%
nest()
#set up the regression model
model <- function(df) {
lm(Total.rain ~ Time.point, data = df)
}
#add model as another column in the data frame
data_City <- df_City %>%
mutate(model = purrr::map(data, model))
#extract the results using the broom package into separate columns
data_all <- data_City %>%
mutate(results = purrr::map(model, broom::tidy)) %>%
unnest(results, .drop = TRUE)
#filter/select for the sought after values
data_all %>%
filter(term == "Time.point") %>%
select(City, estimate)
City estimate
<chr> <dbl>
1 City1 -0.0571
2 City2 -0.0900
3 City3 0.100
4 City4 -0.0507
5 City5 0.0414