I have a dataframe containing a column of the following format, where I have Vars separated by commas ordered by count, with their respective counts after the colon
df1 <- data.frame(A = c('Var1:1', 'Var1:2,Var2:1', 'Var3:2,Var2:1'))
A |
---|
Var1:1 |
Var1:2,Var2:1 |
Var2:1,Var3:1 |
I've separated these out with:
df1_split <- data.frame(dplyr::str_split(df1, A, pattern = ":|,", simplify = TRUE))
Which yields:
X1 | X2 | X3 | X4 | |
---|---|---|---|---|
Var1 | 1 | |||
Var1 | 2 | Var2 | 1 | |
Var3 | 2 | Var2 | 1 |
As you can see, sometimes variables are in column X1, but sometimes they appear in column X3. So Vars are always in odd columns, and their counts from the :
are always in the proceeding even column#
My end goal is to have a dataframe with columns in the format:
Var1 | Var 2 | Var 3 |
---|---|---|
1 | ||
2 | 1 | |
1 | 2 |
I've tried using a function, for loop and case when for this, as I have 19 unique variable names in my original dataframe but they don't always appear in the same order.
fVARCOUNT <- function(x, y){`
# x is df_split, y are the unique Var nanmes e.g. Var 1, Var 2
df_mutated <- df_split
# Loop over each unique variable
for(i in y){
df_mutated <- df_mutated %>%
# Check if that variable is in one of X1, X3, X5 and get the value from the proceeding column if it is
dplyr::mutate('{{i}}' := dplyr::case_when(X1 == i ~ X2,
X3 == i ~ X4))
}
}
However, the result I'm getting is as follows:
Var1 | Var2 | Var 3 |
---|---|---|
1 | ||
2 | ||
2 |
It's as though case_when
is not checking beyond X1
.
CodePudding user response:
Data
df1 <- data.frame(A = c('Var1:1', 'Var1:2,Var2:1', 'Var3:2,Var2:1'))
Code
library(tidyr)
library(dplyr)
df1 %>%
mutate(id = row_number()) %>%
separate_rows(A,sep = ",") %>%
separate(col = A,into = c("name","value")) %>%
pivot_wider(names_from = name,values_from = value)
Output
# A tibble: 3 x 4
id Var1 Var2 Var3
<int> <chr> <chr> <chr>
1 1 1 NA NA
2 2 2 1 NA
3 3 NA 1 2