Home > database >  finding regression and slope of the regression line for specific rows in R
finding regression and slope of the regression line for specific rows in R

Time:03-02

I am working with a large data set that has longitudinal measurements. To simplify what I am working with here is an example. Lets say, a study measured rain fall in specific cities over a period of time. Below is an example data set imported into R. Note some cities do not have as many measurements as others, and the data are somewhat all over the place. The years the data were taken on these cities are not all the same, so we can count them just as observations.

Here is what the data kind of look like in R

        City          Time.point          Total.rain
        City1            1                    0.50
        City1            2                    0.70
        City1            3                    0.60
        City1            4                    0.40
        City1            5                    0.60
        City1            6                    0.20
        City2            1                    1.00
        City2            2                    0.80
        City2            3                    0.50
        City2            4                    0.80
        City3            1                    1.00
        City3            2                    1.20
        City3            3                    1.20
        City4            1                    0.30
        City4            2                    0.20
        City4            3                    0.30
        City4            4                    0.50
        City4            5                    0.10
        City4            6                    0.01
        City4            7                    0.02
        City5            1                    0.10
        City5            2                    0.15
        City5            3                    0.30
        City5            4                    0.30
        City5            5                    0.25
        City5            6                    0.30

How would I find the regression, the slope of the best fit line, for each city? I do not want to compare cities. just find these data for each city and create a new data set with a single point for each city. something like the one below (if I did it correctly by Hand).

       City            Regression.slope
       City1             -0.05714286
       City2             -0.09000000
       City3              0.10000000
       City4             -0.05071429
       City5              0.03714286

Any help would be very much appreciated.

CodePudding user response:

I think this can be simplified greatly, in a single line of code

tidyverse

dat %>% group_by(City) %>% summarize(est = lm(Total.rain~Time.point)$coef[2])

data.table

dat[, .(est = lm(Total.rain~Time.point)$coef[2]), by=.(City)]

Output:

     City         est
   <char>       <num>
1:  City1 -0.05714286
2:  City2 -0.09000000
3:  City3  0.10000000
4:  City4 -0.05071429
5:  City5  0.03714286

CodePudding user response:

I think this gets you there. Double check calc on City5. :-)

library(dplyr)
library(purrr)

#nest each city into a data frame
df_City <- df %>%
    group_by(City) %>%
    nest()
    
#set up the regression model
model <- function(df) {
    lm(Total.rain ~ Time.point, data = df)
}

#add model as another column in the data frame
data_City <- df_City %>%
    mutate(model = purrr::map(data, model))

#extract the results using the broom package into separate columns
data_all <- data_City %>% 
    mutate(results = purrr::map(model, broom::tidy)) %>% 
    unnest(results, .drop = TRUE)

#filter/select for the sought after values
data_all %>%
    filter(term == "Time.point") %>%
    select(City, estimate)


City  estimate
  <chr>    <dbl>
1 City1  -0.0571
2 City2  -0.0900
3 City3   0.100 
4 City4  -0.0507
5 City5   0.0414
  • Related