I have a dataframe of the following form:
id A B C
1 2 1 3
2 2 1 1
3 1 1 3
. . . .
. . . .
id is a unique identifier variable and A is a categorical variable with p levels. What is the most straightforward way to transform the dataframe into a form where each row is repeated p times, and instead of A as a categorical variable for each row, it is a dummy variable which displays 1 in the row corresponding to the level of A for that id, and 0 otherwise? For example, the transformed dataframe above would look like this for 3 levels of A:
id A B C
1 0 1 3
1 1 1 3
1 0 1 3
2 0 1 1
2 1 1 1
2 0 1 1
3 1 1 3
3 0 1 3
3 0 1 3
. . . .
. . . .
Apologies if the title didn't specify the nature of this problem properly or it has already been asked: I'm not well versed in R so I don't really know how to ask this question in a concise way or search for it. Thanks!
CodePudding user response:
Use uncount
to replicate the rows and then change the values of 'A' by creating a logical vector with sequence of rows (row_number()
) after doing a group by 'id'
library(dplyr)
library(tidyr)
p <- 3
df1 %>%
uncount(p) %>%
group_by(id) %>%
mutate( A = (row_number() == A)) %>%
ungroup
-output
# A tibble: 9 × 4
id A B C
<int> <int> <int> <int>
1 1 0 1 3
2 1 1 1 3
3 1 0 1 3
4 2 0 1 1
5 2 1 1 1
6 2 0 1 1
7 3 1 1 3
8 3 0 1 3
9 3 0 1 3
Or the similar option in base R
with rep
and ave
transform(df1[rep(seq_len(nrow(df1)), each = p),],
A = (A == ave(A, id, FUN = seq_along)))
data
df1 <- structure(list(id = 1:3, A = c(2L, 2L, 1L), B = c(1L, 1L, 1L),
C = c(3L, 1L, 3L)), class = "data.frame", row.names = c(NA,
-3L))
CodePudding user response:
This might work:
library(tidyverse)
df <- data.frame(id = c(1:3),
a = c(2, 2, 1),
b = c(1, 1, 1),
c = c(3, 1, 3))
df
rep(df, 3) %>%
arrange(id) %>%
group_by(id) %>%
mutate(a = ifelse(row_number() == a, 1, 0)) %>%
ungroup()
# id a b c
# <int> <dbl> <dbl> <dbl>
# 1 1 0 1 3
# 2 1 1 1 3
# 3 1 0 1 3
# 4 2 0 1 1
# 5 2 1 1 1
# 6 2 0 1 1
# 7 3 1 1 3
# 8 3 0 1 3
# 9 3 0 1 3