I have a tibble in R arranged like this (just the first 20 rows of thousands):
A tibble: 20 × 4
TF | confidence | target | mor | |
---|---|---|---|---|
1 | Ahr | B | Cyp1a1 | 1 |
2 | Ahr | B | Cyp1a2 | 1 |
3 | Ahr | B | Cyp1b1 | 1 |
4 | Ahr | B | Fos | 1 |
5 | Ahr | B | Myc | 1 |
6 | Ahr | B | Ugt1a7c | 1 |
7 | Ahr | B | Ugt1a6a | 1 |
8 | Ahr | B | Ugt1a6b | 1 |
9 | Ahr | B | Asap1 | 1 |
10 | Ahr | B | Erg | 1 |
11 | Ahr | B | Vgll4 | 1 |
12 | Ar | A | Abce1 | 1 |
13 | Ar | A | Acad12 | 1 |
14 | Ar | A | Acad10 | 1 |
15 | Ar | A | Ahsg | 1 |
16 | Ar | A | Akr1b3 | 1 |
17 | Ar | A | Akt1 | 1 |
18 | Ar | A | Anapc10 | 1 |
19 | Ar | A | Ap2m1 | 1 |
20 | Ar | A | Appbp2 | 1 |
and I want to arrange it to look like this:
TF | Description | Targets | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Ahr | None | Cyp1a1 | Cyp1a2 | Cyp1b1 | Fos | Myc | Ugt1a7c | Ugt1a6a | Ugt1a6b | Asap1 | Erg | Vgll4 |
Ar | None | Abce1 | Acad12 | Acad10 | Ahsg | Akr1b3 | Akr1b3 | Akt1 | Anapc10 | Ap2m1 | Appbp2 |
...The table goes on for thousands of lines. There can be a varying number of targets for a given TF value. How do I create this second table as a dataframe in R given the first table?
Thank you!
CodePudding user response:
library(dplyr)
df %>%
group_by(TF) %>%
summarise(Targets = list(target))
#> TF Targets
#> 1 Ahr Cyp1a1, Cyp1a2, Cyp1b1, Fos, Myc, Ugt1a7c, Ugt1a6a, Ugt1a6b, Asap1, Erg, Vgll4
#> 2 Ar Abce1, Acad12, Acad10, Ahsg, Akr1b3, Akt1, Anapc10, Ap2m1, Appbp2
This results in a so-called list-column (Targets
is a list of character vectors), that will work for varying numbers of targets per TF.
With data:
df <- read.table(text =
'TF confidence target mor
Ahr B Cyp1a1 1
Ahr B Cyp1a2 1
Ahr B Cyp1b1 1
Ahr B Fos 1
Ahr B Myc 1
Ahr B Ugt1a7c 1
Ahr B Ugt1a6a 1
Ahr B Ugt1a6b 1
Ahr B Asap1 1
Ahr B Erg 1
Ahr B Vgll4 1
Ar A Abce1 1
Ar A Acad12 1
Ar A Acad10 1
Ar A Ahsg 1
Ar A Akr1b3 1
Ar A Akt1 1
Ar A Anapc10 1
Ar A Ap2m1 1
Ar A Appbp2 1',
header = TRUE)