I have a dataframe that looks like this.
Gene SNP Score
1 AKT3 rs2220276 6.5091
2 ARHGAP44 rs2220276 4.7194
3 BRINP2 rs16851037 3.2606
4 C12orf42 rs16851037 3.2563
5 CCDC122 rs11619756 4.3142
6 CCDC68 rs11619756 2.3614
I want to transform the dataframe so it looks like this - essentially creating an extra column for each element in the Gene
column that matches the SNP
column.
Gene 1 SNP Gene 1 Score Gene 2 Gene 2 Score
1 AKT3 rs2220276 6.5091 ARHGAP44 4.7194
2 BRINP2 rs16851037 3.2606 C12orf42 3.2563
5 CCDC122 rs11619756 4.3142 CCDC68 2.3614
How can I achieve this?
df <- data.frame(Gene = c("AKT3", "ARHGAP44", "BRINP2", "C12orf42", "CCDC122","CCDC68") ,
"SNP" = c("rs2220276", "rs2220276", "rs16851037", "rs16851037","rs11619756", "rs11619756"),
Score = c(6.5091, 4.7194, 3.2606, 3.2563, 4.3142, 2.3614))
CodePudding user response:
You can transform this by using pivot_wider()
from tidyr
package, assisted by a helper column, id
:
library(dplyr)
library(tidyr)
df %>%
group_by(SNP) %>%
mutate(id = row_number()) %>%
ungroup() %>%
pivot_wider(id_cols=SNP, names_from = id, values_from = c(Gene, Score))
Output:
SNP Gene_1 Gene_2 Score_1 Score_2
<chr> <chr> <chr> <dbl> <dbl>
1 rs2220276 AKT3 ARHGAP44 6.51 4.72
2 rs16851037 BRINP2 C12orf42 3.26 3.26
3 rs11619756 CCDC122 CCDC68 4.31 2.36
CodePudding user response:
The data is already in a favorable shape.
Anyways, the desired format can be generated using pivot_wider
and unnest_wider
library(tidyr)
pivot_wider(df, SNP, values_from=c(Score, Gene), values_fn=list) |>
unnest_wider(Score_, names_sep="") |>
unnest_wider(Gene_, names_sep="")
# A tibble: 3 × 5
SNP Score_1 Score_2 Gene_1 Gene_2
<chr> <dbl> <dbl> <chr> <chr>
1 rs2220276 6.51 4.72 AKT3 ARHGAP44
2 rs16851037 3.26 3.26 BRINP2 C12orf42
3 rs11619756 4.31 2.36 CCDC122 CCDC68
CodePudding user response:
Assuming you always want to keep the first occurrence of any duplicate you can do
library(tidyverse)
df %>%
distinct(SNP, .keep_all = TRUE)
Gene SNP Score
1 AKT3 rs2220276 6.5091
2 BRINP2 rs16851037 3.2606
3 CCDC122 rs11619756 4.3142
CodePudding user response:
Using dcast
library(data.table)
dcast(setDT(df), SNP ~ rowid(SNP), value.var = c("Gene", "Score"))
Key: <SNP>
SNP Gene_1 Gene_2 Score_1 Score_2
<char> <char> <char> <num> <num>
1: rs11619756 CCDC122 CCDC68 4.3142 2.3614
2: rs16851037 BRINP2 C12orf42 3.2606 3.2563
3: rs2220276 AKT3 ARHGAP44 6.5091 4.7194
Or similar option with pivot_wider
library(tidyr)
library(dplyr)
df %>%
mutate(rn = rowid(SNP)) %>%
pivot_wider(names_from = rn, values_from = c(Gene, Score))
# A tibble: 3 × 5
SNP Gene_1 Gene_2 Score_1 Score_2
<chr> <chr> <chr> <dbl> <dbl>
1 rs2220276 AKT3 ARHGAP44 6.51 4.72
2 rs16851037 BRINP2 C12orf42 3.26 3.26
3 rs11619756 CCDC122 CCDC68 4.31 2.36