Hi i have data that looks like this :
X snp_id is_severe encoding_1 encoding_2 encoding_0 chisq p.value AF_total AF_LATIN
1 1 chr21-10139833-A-C 0 0 1 7 1.70625 0.191 0.4 0.3
2 2 chr21-10139833-A-C 1 0 0 13 1.70625 0.191 0.4 0.3
3 3 chr21-10141374-T-C 0 0 1 7 1.70625 0.191 0.5 0.2
4 4 chr21-10141374-T-C 1 0 0 13 1.70625 0.191 0.5 0.2
and it continues to the right with
snp_id REF ALT
chr21-10139833-A-C A C
chr21-10139833-A-C A C
chr21-10141374-T-C T C
chr21-10141374-T-C T C
this data is very long and every snp_id has is_severe 0 and is_sever_1 ( in this example the p/chisq values are the same but in the whole data they are different what i wish to do is to reshape its structure and make it looks like this
snp_id is_severe_0_encoding_0 is_severe_0_encoding_1 is_severe_0_encoding_2
chr21-10139833-A-C 7 0 1
chr21-10141374-T-C 7 0 1
and the table continues to the right with this:
snp_id is_severe_1_encoding_0 is_severe_1_encoding_1 is_severe_1_encoding_2
chr21-10139833-A-C 13 0 0
chr21-10141374-T-C 13 0 0
and the table continues to the right with this:
snp_id chisq p.value Af_total Af_latin REF ALT
chr21-10139833-A-C 1.70625 0.191 0.4 0.3 A C
chr21-10141374-T-C 1.70625 0.191 0.5 0.2 T C
i saw some answers in stack overflow on this topic but couldn't find that would fit my problem
for example :
Converting data from wide to long format when id variables are encoded in column header
How to reshape data from long to wide format
would appreciate any help
code for sample data :
snp_id <- c("chr21-10139833-A-C", "chr21-10139833-A-C","chr21-10141374-T-C","chr21-10141374-T-C")
is_severe <- c("0", "1","0","1")
encoding_1=c(0,0,0,0)
encoding_2=c(1,0,1,0)
encoding_0=c(7,13,7,13)
chisq=c(1.70625,1.70625,1.70625,1.70625)
pvalue=c(0.191,0.191,0.191,0.191)
REF=c("A","A","T","T")
ALT=c("C","C","C","C")
AF_TOTAL=c(0.4,0.4,0.5,0.5)
AF_latin=c(0.3,0.3,0.2,0.2)
df <- data.frame(snp_id,is_severe,encoding_1,encoding_2,encoding_0,chisq,pvalue,REF,ALT,AF_TOTAL,AF_latin)
CodePudding user response:
Is this what you need?
df %>%
pivot_wider(names_from = is_severe,
values_from = matches("encoding"),
names_glue = "is_severe_{.name}") %>%
rename_with(~ str_replace_all(., "(is_severe_)(encoding_.)_(.)", "\\1\\3_\\2")) %>%
select(snp_id, matches("is_severe"), everything())
# A tibble: 2 × 13
snp_id is_severe_0_encoding_1 is_severe_1_encoding_1 is_sev…¹ is_se…² is_se…³ is_se…⁴ chisq pvalue REF ALT AF_TO…⁵ AF_la…⁶
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <chr> <dbl> <dbl>
1 chr21-10139833-A-C 0 0 1 0 7 13 1.71 0.191 A C 0.4 0.3
2 chr21-10141374-T-C 0 0 1 0 7 13 1.71 0.191 T C 0.5 0.2
# … with abbreviated variable names ¹is_severe_0_encoding_2, ²is_severe_1_encoding_2, ³is_severe_0_encoding_0, ⁴is_severe_1_encoding_0,
# ⁵AF_TOTAL, ⁶AF_latin