I have the following df and I need to run for each player the following regression model:
ln(score)_t = \beta_1 \beta_2\mbox{time_playing}
My code and the example df is something like:
```
library(tidyverse)
library(broom)
df_players <- read.csv("https://github.com/rhozon/datasets/raw/master/data_test_players.csv", head = T, sep = ";") %>%
glimpse()
Rows: 105
Columns: 3
$ player <chr> "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a"…
$ time_playing <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 1,…
$ score <int> 7, 5, 2, 3, 10, 8, 7, 10, 10, 3, 8, 5, 2, 5, 6, 9, 9, 8, 9, 4, 6, 4, 9, 8, 8, 5, 2, 10, 9, 5, 7, 4, 5, 8, 10, 2, 3, 8, 8, 5, 7, 6, 10…
```
The desired dataframe is something like:
```
df
player beta_2
1 a 0.005958000
2 b -0.004110000
3 c 0.000390777
```
How did can I use the lm
function for estimate for each different player the beta_2 coefs and generate it like the desired dataframe as showed above ?
CodePudding user response:
Most of what you need is in this solution, but here is an answer tailored to your case:
library(dplyr)
## Create data following your structure
n <- 20 # Number of observations per player
N <- 10 # Number of players
# Simulate data
df <- tibble(
player = rep(letters[1:10], each = n),
time_playing = rnorm(n * N),
e_i = rnorm(n * N),
beta_2 = rep(runif(N), each = 20),
score = exp(beta_2 * time_playing e_i)
)
## Estimate table of betas
betatbl <- df %>%
group_by(player) %>%
do(regs = lm(score ~ time_playing, data = .data)) %>%
mutate(
beta1 = coef(regs)[1],
beta2 = coef(regs)[2]
)
CodePudding user response:
There might be several ways to do it. This is one of them:
df<-df_players %>% group_by(player) %>% nest()
my_lm <- function(df) {
lm(score ~ time_playing, data = df) %>% broom::tidy()
}
df %>% mutate(coefs = map(data, my_lm)) %>%
unnest(coefs) %>% filter(term == "time_playing")