What's the easiest way to duplicate observations in a dataset while making a small change to ea-CodePudding

first-time poster, so I hope I'm giving enough information, and I hope that this question isn't an obvious duplicate (I ran a search and couldn't find exactly what I needed). If I can clarify anything, please let me know!

I'm an R novice working on a group project using NBA play-by-play data, and my group is looking at the effect of the shot clock on certain possession outcomes. Right now, we have dummy variables for every second elapsed on the shot clock in each row. For example, a possession that lasted 10 seconds would have a value of 1 for the dummies 24, 23, 22, 21, 20, 19, 18, 17, 16, and 15 and a zero for the remaining dummies 14-1.

Here's a simplified sample of what the current dataset would look like:

PossessionID	OffensiveTeam	DefensiveTeam	Points	24	23	22	21	20
12345	CHI	IND	2	1	1	1	0	0
12346	IND	CHI	0	1	1	0	0	0
12347	CHI	IND	3	1	1	1	1	1

And here's what we need it to look like for those same 3 entries from above:

PossessionID	OffensiveTeam	DefensiveTeam	Points	24	23	22	21	20
12345	CHI	IND	2	1	0	0	0	0
12345	CHI	IND	2	0	1	0	0	0
12345	CHI	IND	2	0	0	1	0	0
12346	IND	CHI	0	1	0	0	0	0
12346	IND	CHI	0	0	1	0	0	0
12347	CHI	IND	3	1	0	0	0	0
12347	CHI	IND	3	0	1	0	0	0
12347	CHI	IND	3	0	0	1	0	0
12347	CHI	IND	3	0	0	0	1	0
12347	CHI	IND	3	0	0	0	0	1

In case the difference isn't clear, we need to create rows such that each row has only one of the dummies equal 1 and hold the rest of the values in the row constant. Note how the 1s "cascade" down with each new row, but the rest of the values are the same for each row from the original dataset.

Does anyone have a suggestion for how best to create these new rows? Ideally would be one that uses nothing beyond base R, tidyverse, and dplyr. Thanks so much!

CodePudding user response：

df %>%
  pivot_longer(-(1:4))%>%
  filter(value>0) %>%
  mutate(nm = name) %>%
  pivot_wider(values_fill = 0) %>%
  select(-nm)

# A tibble: 10 x 9
   PossessionID OffensiveTeam DefensiveTeam Points  `24`  `23`  `22`  `21`  `20`
          <int> <chr>         <chr>          <int> <int> <int> <int> <int> <int>
 1        12345 CHI           IND                2     1     0     0     0     0
 2        12345 CHI           IND                2     0     1     0     0     0
 3        12345 CHI           IND                2     0     0     1     0     0
 4        12346 IND           CHI                0     1     0     0     0     0
 5        12346 IND           CHI                0     0     1     0     0     0
 6        12347 CHI           IND                3     1     0     0     0     0
 7        12347 CHI           IND                3     0     1     0     0     0
 8        12347 CHI           IND                3     0     0     1     0     0
 9        12347 CHI           IND                3     0     0     0     1     0
10        12347 CHI           IND                3     0     0     0     0     1

CodePudding user response：

Here is another approach:

Small function f(n) return a tibble of size n rows by 24 columns, with 1's on the diagonal

f <- function(n) {
 m=matrix(rep(0,n*24), nrow=n,ncol=24)
 diag(m) <- 1
 setNames(as_tibble(m), paste0("v",24:1))
}

Apply the function to the original data and unnest

k %>% 
  rowwise() %>%
  mutate(vars = list(f(sum(c_across(`24`:`20`))))) %>% 
  select(-(5:9)) %>% 
  unnest(vars)

Output:

   PossessionID OffensiveTeam DefensiveTeam Points   v24   v23   v22   v21   v20   v19   v18   v17   v16   v15   v14   v13   v12   v11   v10    v9    v8    v7    v6
          <int> <chr>         <chr>          <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
 1        12345 CHI           IND                2     1     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0
 2        12345 CHI           IND                2     0     1     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0
 3        12345 CHI           IND                2     0     0     1     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0
 4        12346 IND           CHI                0     1     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0
 5        12346 IND           CHI                0     0     1     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0
 6        12347 CHI           IND                3     1     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0
 7        12347 CHI           IND                3     0     1     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0
 8        12347 CHI           IND                3     0     0     1     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0
 9        12347 CHI           IND                3     0     0     0     1     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0
10        12347 CHI           IND                3     0     0     0     0     1     0     0     0     0     0     0     0     0     0     0     0     0     0     0
# … with 5 more variables: v5 <dbl>, v4 <dbl>, v3 <dbl>, v2 <dbl>, v1 <dbl>

Input:

k = structure(list(PossessionID = 12345:12347, OffensiveTeam = c("CHI", 
"IND", "CHI"), DefensiveTeam = c("IND", "CHI", "IND"), Points = c(2L, 
0L, 3L), `24` = c(1L, 1L, 1L), `23` = c(1L, 1L, 1L), `22` = c(1L, 
0L, 1L), `21` = c(0L, 0L, 1L), `20` = c(0L, 0L, 1L)), row.names = c(NA, 
-3L), class = "data.frame")