How to run multiple lm models in R and generate a new df?-CodePudding

I have the following df and I need to run for each player the following regression model:

ln(score)_t = \beta_1 \beta_2\mbox{time_playing}

My code and the example df is something like:

```
library(tidyverse)
library(broom)

df_players <- read.csv("https://github.com/rhozon/datasets/raw/master/data_test_players.csv", head = T, sep = ";") %>% 
  glimpse()

Rows: 105
Columns: 3
$ player       <chr> "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a"…
$ time_playing <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 1,…
$ score        <int> 7, 5, 2, 3, 10, 8, 7, 10, 10, 3, 8, 5, 2, 5, 6, 9, 9, 8, 9, 4, 6, 4, 9, 8, 8, 5, 2, 10, 9, 5, 7, 4, 5, 8, 10, 2, 3, 8, 8, 5, 7, 6, 10…

```

The desired dataframe is something like:

```
df
  player       beta_2
1      a  0.005958000
2      b -0.004110000
3      c  0.000390777
```

How did can I use the lm function for estimate for each different player the beta_2 coefs and generate it like the desired dataframe as showed above ?

CodePudding user response：

Most of what you need is in this solution, but here is an answer tailored to your case:

library(dplyr)

## Create data following your structure
n <- 20  # Number of observations per player
N <- 10  # Number of players

# Simulate data
df <- tibble(
    player = rep(letters[1:10], each = n),
    time_playing = rnorm(n * N),
    e_i = rnorm(n * N),
    beta_2 = rep(runif(N), each = 20),
    score = exp(beta_2 * time_playing   e_i)
)

## Estimate table of betas
betatbl <- df %>%
    group_by(player) %>%
    do(regs = lm(score ~ time_playing, data = .data)) %>%
    mutate(
        beta1 = coef(regs)[1],
        beta2 = coef(regs)[2]
    )

CodePudding user response：

There might be several ways to do it. This is one of them:

df<-df_players %>% group_by(player) %>% nest() 

my_lm <- function(df) {
    lm(score ~ time_playing, data = df) %>% broom::tidy()
    }

df %>% mutate(coefs = map(data, my_lm)) %>% 
    unnest(coefs) %>% filter(term == "time_playing")