Home > Enterprise >  Extract for each Territory a linear regression model - R
Extract for each Territory a linear regression model - R

Time:03-23

i have the following problem.

From the following starting matrix called "R" I want to derive a linear model for each "Territorio" (territory) by linking two variables together "Morti_per_abitante" and "PL_per_abitante" in differente ages ("Anno").

enter image description here

To do this I created an empty table called TT in which for each territory I will be able to enter the values ​​of the linear model (slope or "Tasso" and significance or "signficativita") enter image description here

At this point I created a for loop that working on the TT table based on the one of every 13 "Territorio" would obtain the model with its 2 parameters:

for (i in 1:13){
  Ter<-TT[i:1]
  filter(R, Territorio==Ter) %>%
    Tasso[1:2]<-lm(Morti_per_abitante ~ PL_per_abitante,R)$coefficents[2]
    }

The code does not work starting from the inserted filter.

What can I do to correct the line of code?

Is there an alternative way to have the linear model for each Territory?

Thanks

CodePudding user response:

Here is a dplyr broom approach. I am using a mock dataframe R.

library(broom)
library(dplyr)

set.seed(123)

R <- data.frame(Territorio = rep(c("Lazio", "Puglia"), c(10, 5)),
                Anno = sample(2015:2020, size = 15, replace = TRUE),
                Morti_per_abitante = rnorm(15),
                PL_per_abitante = rnorm(15))


R |>
  group_by(Territorio) |>
  group_modify(~{
    tidy(lm(Morti_per_abitante ~ PL_per_abitante, data = .x))
  }) |>
  filter(term == "PL_per_abitante") |>
  rename(Tasso = estimate, Significativita = p.value) |>
  select(Territorio, Tasso, Significativita)

#>   # A tibble: 2 × 3
#> # Groups:   Territorio [2]
#>   Territorio  Tasso Significativita
#>   <chr>       <dbl>           <dbl>
#> 1 Lazio      -0.339           0.367
#> 2 Puglia      0.239           0.592

  • Related