Let's suppose we have the dataframe below:
df <- read.table(header=T, text=
'Patient_ID Gene Type
1 ATM 3
1 MEN1 1
2 BRCA1 3
2 RAD51C 2
2 BRCA2 2
3 CHEK2 1
4 MUTYH 1
4 BRCA2 3', stringsAsFactors=F)
How can I rearrange this dataframe to make it look like the following :
ID ATM MEN1 BRCA1 RAD51C CHEK2 MUTYH BRCA2
1 3 1
2 3 2 2
3 1
4 1 3
Please note that each row is now a unique case and the column Type
was used to provide the values for the new created columns.
CodePudding user response:
Your data is long/tidy. You want it to be wide. There are many functions to do this in R. A commonly used one is tidyr::pivot_wider()
, which I demonstrate below:
library(tidyverse)
df <- read.table(header=T, text=
'Patient_ID Gene Type
1 ATM 3
1 MEN1 1
2 BRCA1 3
2 RAD51C 2
2 BRCA2 2
3 CHEK2 1
4 MUTYH 1
4 BRCA2 3', stringsAsFactors=F)
# Blank cells will be NA
df |>
rename(ID = Patient_ID) |>
pivot_wider(names_from = Gene,
values_from = Type)
#> # A tibble: 4 × 8
#> ID ATM MEN1 BRCA1 RAD51C BRCA2 CHEK2 MUTYH
#> <int> <int> <int> <int> <int> <int> <int> <int>
#> 1 1 3 1 NA NA NA NA NA
#> 2 2 NA NA 3 2 2 NA NA
#> 3 3 NA NA NA NA NA 1 NA
#> 4 4 NA NA NA NA 3 NA 1
# Blank cells as empty strings ("")
df |>
rename(ID = Patient_ID) |>
pivot_wider(names_from = Gene,
values_from = Type,
values_fn = as.character,
values_fill = "")
#> # A tibble: 4 × 8
#> ID ATM MEN1 BRCA1 RAD51C BRCA2 CHEK2 MUTYH
#> <int> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 1 "3" "1" "" "" "" "" ""
#> 2 2 "" "" "3" "2" "2" "" ""
#> 3 3 "" "" "" "" "" "1" ""
#> 4 4 "" "" "" "" "3" "" "1"
Created on 2022-05-23 by the reprex package (v2.0.1)
EDIT: Second solution simplified in line with @DarrenTsai's comment