Home > Back-end >  Transform dataframe to repeat rows for categorical variables in R
Transform dataframe to repeat rows for categorical variables in R

Time:11-13

I have a dataframe of the following form:

id   A  B  C
 1   2  1  3
 2   2  1  1
 3   1  1  3
 .   .  .  .
 .   .  .  .

id is a unique identifier variable and A is a categorical variable with p levels. What is the most straightforward way to transform the dataframe into a form where each row is repeated p times, and instead of A as a categorical variable for each row, it is a dummy variable which displays 1 in the row corresponding to the level of A for that id, and 0 otherwise? For example, the transformed dataframe above would look like this for 3 levels of A:

id   A  B  C
 1   0  1  3
 1   1  1  3
 1   0  1  3
 2   0  1  1
 2   1  1  1
 2   0  1  1
 3   1  1  3
 3   0  1  3
 3   0  1  3
 .   .  .  .
 .   .  .  .

Apologies if the title didn't specify the nature of this problem properly or it has already been asked: I'm not well versed in R so I don't really know how to ask this question in a concise way or search for it. Thanks!

CodePudding user response:

Use uncount to replicate the rows and then change the values of 'A' by creating a logical vector with sequence of rows (row_number()) after doing a group by 'id'

library(dplyr)
library(tidyr)
p <- 3
df1 %>%
    uncount(p) %>% 
    group_by(id) %>%
    mutate( A =  (row_number() ==  A)) %>%
    ungroup

-output

# A tibble: 9 × 4
     id     A     B     C
  <int> <int> <int> <int>
1     1     0     1     3
2     1     1     1     3
3     1     0     1     3
4     2     0     1     1
5     2     1     1     1
6     2     0     1     1
7     3     1     1     3
8     3     0     1     3
9     3     0     1     3

Or the similar option in base R with rep and ave

transform(df1[rep(seq_len(nrow(df1)), each = p),], 
     A =  (A == ave(A, id, FUN = seq_along)))

data

df1 <- structure(list(id = 1:3, A = c(2L, 2L, 1L), B = c(1L, 1L, 1L), 
    C = c(3L, 1L, 3L)), class = "data.frame", row.names = c(NA, 
-3L))

CodePudding user response:

This might work:

library(tidyverse)
df <- data.frame(id = c(1:3), 
                 a = c(2, 2, 1), 
                 b = c(1, 1, 1),
                 c = c(3, 1, 3))
df
rep(df, 3) %>% 
  arrange(id) %>% 
  group_by(id) %>% 
  mutate(a = ifelse(row_number() == a, 1, 0)) %>% 
  ungroup()
#      id     a     b     c
#   <int> <dbl> <dbl> <dbl>
# 1     1     0     1     3
# 2     1     1     1     3
# 3     1     0     1     3
# 4     2     0     1     1
# 5     2     1     1     1
# 6     2     0     1     1
# 7     3     1     1     3
# 8     3     0     1     3
# 9     3     0     1     3
  •  Tags:  
  • r
  • Related