I have dataset with multiple timepoints for e.g. col1, col2, col3, col4 with each column having similar categories (A,B,C,D)
|ID | col1 | col2| col3| col4| ColA
------------------------------
|1 |A |A |B |B
------------------------------
|2 |A |B |B |C
-------------------------------
|3 |B |B |C |C
-------------------------------
|4 |B |C |C |D
-------------------------------
|5 |C |C |D |D
--------------------------------
Now I want to create a new variable ("colA") such that if A is found in any of the col1-col4 then it will be assigned value 1 else value 0. Similar for 1 for B new in "colB", C for "colC", and D for "colD". Here ColA will have 1,1,0,0,0 and ColB will have 1,1,1,1,0 values so on.
I tried multiple options which I found here For eg. (without loop)
df %>%
mutate(colA=case_when(starts_with("col")=="A") ~ 1, TRUE~0)
or
(with a loop)
for (I in 1:4) {
df[,"colA"] <- ifelse(unlist(df[,paste0("col",i)]=="A",1,0)
}
but it did not work. in for loop, the data gets overwritten at the end of the loop, so colA reads as 0,0,0,0,0 as there are no A's in col4. I am using multiple datasets where range of i may vary from 1:2 to 1:4.
structure(list(id = 1:6, col1 = c(age11 = "A", age12 = "A", age13 = "B",
age14 = "A", age15 = "C", age16 = "B"), col2 = c(age21 = "A",
age22 = NA, age23 = "C", age24 = "A", age25 = "C", age26 = "B"
), col3 = c(age31 = "B", age32 = "B", age33 = NA, age34 = NA,
age35 = "C", age36 = "B"), col4 = c(age41 = "B", age42 = NA,
age43 = NA, age44 = NA, age45 = "D", age46 = "C")), row.names = c(NA,
-6L), class = c("tbl_df", "tbl", "data.frame"))
CodePudding user response:
df %>%
pivot_longer(-ID, names_to = 'value', values_to = 'name') %>%
pivot_wider(values_fn = length, values_fill = 0, names_prefix = 'col')%>%
mutate(across(-ID, ~ (.>0)))%>%
right_join(df, 'ID')
# A tibble: 5 x 9
ID colA colB colC colD col1 col2 col3 col4
<int> <int> <int> <int> <int> <chr> <chr> <chr> <chr>
1 1 1 1 0 0 A A B B
2 2 1 1 1 0 A B B C
3 3 0 1 1 0 B B C C
4 4 0 1 1 1 B C C D
5 5 0 0 1 1 C C D D
CodePudding user response:
in Base R:
cbind(df, (table(cbind(df['ID'], unlist(df[-1])))>1))
ID col1 col2 col3 col4 A B C D
1 1 A A B B 1 1 0 0
2 2 A B B C 0 1 0 0
3 3 B B C C 0 1 1 0
4 4 B C C D 0 0 1 0
5 5 C C D D 0 0 1 1
CodePudding user response:
library(dplyr, warn.conflicts = FALSE)
df %>%
mutate(colA = coalesce(if_any(col1:col4, `==`, 'A'), 0))
#> # A tibble: 6 × 6
#> id col1 col2 col3 col4 colA
#> <int> <chr> <chr> <chr> <chr> <dbl>
#> 1 1 A A B B 1
#> 2 2 A <NA> B <NA> 1
#> 3 3 B C <NA> <NA> 0
#> 4 4 A A <NA> <NA> 1
#> 5 5 C C C D 0
#> 6 6 B B B C 0
Created on 2022-07-14 by the reprex package (v2.0.1)