transforming long data to wide in R-CodePudding

Hi i have data that looks like this :

X             snp_id is_severe encoding_1 encoding_2 encoding_0   chisq   p.value  AF_total  AF_LATIN
1 1 chr21-10139833-A-C         0          0          1          7 1.70625 0.191       0.4          0.3      
2 2 chr21-10139833-A-C         1          0          0         13 1.70625 0.191        0.4        0.3
3 3 chr21-10141374-T-C         0          0          1          7 1.70625 0.191        0.5         0.2
4 4 chr21-10141374-T-C         1          0          0         13 1.70625 0.191        0.5          0.2

and it continues to the right with

            snp_id         REF         ALT
chr21-10139833-A-C         A            C
chr21-10139833-A-C        A              C   
chr21-10141374-T-C         T              C
chr21-10141374-T-C         T             C

this data is very long and every snp_id has is_severe 0 and is_sever_1 ( in this example the p/chisq values are the same but in the whole data they are different what i wish to do is to reshape its structure and make it looks like this

 snp_id              is_severe_0_encoding_0 is_severe_0_encoding_1 is_severe_0_encoding_2    
chr21-10139833-A-C         7                          0                    1                        
chr21-10141374-T-C        7                           0                    1

and the table continues to the right with this:

snp_id              is_severe_1_encoding_0 is_severe_1_encoding_1 is_severe_1_encoding_2    
chr21-10139833-A-C         13                         0                    0                        
chr21-10141374-T-C        13                           0                    0

and the table continues to the right with this:

snp_id chisq p.value Af_total Af_latin REF ALT
chr21-10139833-A-C 1.70625 0.191 0.4 0.3 A C
chr21-10141374-T-C 1.70625 0.191 0.5 0.2 T C

i saw some answers in stack overflow on this topic but couldn't find that would fit my problem
for example : Converting data from wide to long format when id variables are encoded in column header How to reshape data from long to wide format

would appreciate any help

code for sample data :

snp_id <- c("chr21-10139833-A-C", "chr21-10139833-A-C","chr21-10141374-T-C","chr21-10141374-T-C")
is_severe <- c("0", "1","0","1")
encoding_1=c(0,0,0,0)
encoding_2=c(1,0,1,0)
encoding_0=c(7,13,7,13)
chisq=c(1.70625,1.70625,1.70625,1.70625)
pvalue=c(0.191,0.191,0.191,0.191)
REF=c("A","A","T","T")
ALT=c("C","C","C","C")
AF_TOTAL=c(0.4,0.4,0.5,0.5)
AF_latin=c(0.3,0.3,0.2,0.2)
df <- data.frame(snp_id,is_severe,encoding_1,encoding_2,encoding_0,chisq,pvalue,REF,ALT,AF_TOTAL,AF_latin)

CodePudding user response：

Is this what you need?

df %>%
   pivot_wider(names_from = is_severe,
               values_from = matches("encoding"),
               names_glue = "is_severe_{.name}") %>% 
   rename_with(~ str_replace_all(., "(is_severe_)(encoding_.)_(.)", "\\1\\3_\\2")) %>%
   select(snp_id, matches("is_severe"), everything())
# A tibble: 2 × 13
  snp_id             is_severe_0_encoding_1 is_severe_1_encoding_1 is_sev…¹ is_se…² is_se…³ is_se…⁴ chisq pvalue REF   ALT   AF_TO…⁵ AF_la…⁶
  <chr>                               <dbl>                  <dbl>    <dbl>   <dbl>   <dbl>   <dbl> <dbl>  <dbl> <chr> <chr>   <dbl>   <dbl>
1 chr21-10139833-A-C                      0                      0        1       0       7      13  1.71  0.191 A     C         0.4     0.3
2 chr21-10141374-T-C                      0                      0        1       0       7      13  1.71  0.191 T     C         0.5     0.2
# … with abbreviated variable names ¹is_severe_0_encoding_2, ²is_severe_1_encoding_2, ³is_severe_0_encoding_0, ⁴is_severe_1_encoding_0,
#   ⁵AF_TOTAL, ⁶AF_latin