I have a file that looks like this (file.txt):
column1 column2 column3 column4 column5 column6 column7 column8 column9 column10 column11 column12 column13 column14 column15 column16
1 chr1_10000044_A_T_b38 ENSG00000280113.2 171773 29 30 0.02 0.33 0.144 0.14 chr1 10000044 A T chr1 10060102
2 chr7_10000044_A_T_b38 ENSG00000178585.14 -58627 29 30 0.024 0.26 0.16 0.15 chr7 10000044 A T chr7 18054785
4 chr1_10000044_A_T_b38 ENSG00000280113.2 89708 29 30 0.0 0.03 -0.0 0.038 chr1 10000044 A T chr1 18054638
5 chr1_10000044_A_T_b38 ENSG00000231181.1 -472482 29 30 0.02 0.16 0.11 0.07 chr1 10000044 A T chr1 18052645
6 chr8_304959_A_T_b38 ENSG00000178585.14 -586 60 30 0.026 0.76 0.16 0.15 chr7 10000044 A T chr7 18054785
I want to print the same values in column 3 so that the output looks like this. For one unique value one output would be:
column1 column2 column3 column4 column5 column6 column7 column8 column9 column10 column11 column12 column13 column14 column15 column16
1 chr1_10000044_A_T_b38 ENSG00000280113.2 171773 29 30 0.02 0.33 0.144 0.14 chr1 10000044 A T chr1 10060102
4 chr1_10000044_A_T_b38 ENSG00000280113.2 89708 29 30 0.0 0.03 -0.0 0.038 chr1 10000044 A T chr1 18054638
For the second unique value in column 3 it should be
column1 column2 column3 column4 column5 column6 column7 column8 column9 column10 column11 column12 column13 column14 column15 column16
2 chr7_10000044_A_T_b38 ENSG00000178585.14 -58627 29 30 0.024 0.26 0.16 0.15 chr7 10000044 A T chr7 18054785
6 chr8_304959_A_T_b38 ENSG00000178585.14 -586 60 30 0.026 0.76 0.16 0.15 chr7 10000044 A T chr7 18054785
CodePudding user response:
Update
You can split by each unique value using group_split
, which will print the groups.
library(dplyr)
group_split(df, column3)
Or in base R:
split(df, f = df$column3)
Output
[[1]]
# A tibble: 2 × 16
column1 column2 column3 column4 column5 column6 column7 column8 column9 column10 column11 column12 column13 column14 column15 column16
<int> <chr> <chr> <int> <int> <int> <dbl> <dbl> <dbl> <dbl> <chr> <int> <chr> <lgl> <chr> <int>
1 2 chr7_10000044_A_T_b38 ENSG00000178585.14 -58627 29 30 0.024 0.26 0.16 0.15 chr7 10000044 A TRUE chr7 18054785
2 6 chr8_304959_A_T_b38 ENSG00000178585.14 -586 60 30 0.026 0.76 0.16 0.15 chr7 10000044 A TRUE chr7 18054785
[[2]]
# A tibble: 1 × 16
column1 column2 column3 column4 column5 column6 column7 column8 column9 column10 column11 column12 column13 column14 column15 column16
<int> <chr> <chr> <int> <int> <int> <dbl> <dbl> <dbl> <dbl> <chr> <int> <chr> <lgl> <chr> <int>
1 5 chr1_10000044_A_T_b38 ENSG00000231181.1 -472482 29 30 0.02 0.16 0.11 0.07 chr1 10000044 A TRUE chr1 18052645
[[3]]
# A tibble: 2 × 16
column1 column2 column3 column4 column5 column6 column7 column8 column9 column10 column11 column12 column13 column14 column15 column16
<int> <chr> <chr> <int> <int> <int> <dbl> <dbl> <dbl> <dbl> <chr> <int> <chr> <lgl> <chr> <int>
1 1 chr1_10000044_A_T_b38 ENSG00000280113.2 171773 29 30 0.02 0.33 0.144 0.14 chr1 10000044 A TRUE chr1 10060102
2 4 chr1_10000044_A_T_b38 ENSG00000280113.2 89708 29 30 0 0.03 0 0.038 chr1 10000044 A TRUE chr1 18054638
If you need to write each unique dataframe to a file, then you can use lapply
. Here, I use the unique names from column3
to give a name to each txt file.
mylist <- split(df , f = df$column3)
lapply(names(mylist), function(x) write.table(mylist[[x]], file=paste(x,".txt"), sep="\t"))
Data
df <- structure(list(column1 = c(1L, 2L, 4L, 5L, 6L), column2 = c("chr1_10000044_A_T_b38",
"chr7_10000044_A_T_b38", "chr1_10000044_A_T_b38", "chr1_10000044_A_T_b38",
"chr8_304959_A_T_b38"), column3 = c("ENSG00000280113.2", "ENSG00000178585.14",
"ENSG00000280113.2", "ENSG00000231181.1", "ENSG00000178585.14"
), column4 = c(171773L, -58627L, 89708L, -472482L, -586L), column5 = c(29L,
29L, 29L, 29L, 60L), column6 = c(30L, 30L, 30L, 30L, 30L), column7 = c(0.02,
0.024, 0, 0.02, 0.026), column8 = c(0.33, 0.26, 0.03, 0.16, 0.76
), column9 = c(0.144, 0.16, 0, 0.11, 0.16), column10 = c(0.14,
0.15, 0.038, 0.07, 0.15), column11 = c("chr1", "chr7", "chr1",
"chr1", "chr7"), column12 = c(10000044L, 10000044L, 10000044L,
10000044L, 10000044L), column13 = c("A", "A", "A", "A", "A"),
column14 = c(TRUE, TRUE, TRUE, TRUE, TRUE), column15 = c("chr1",
"chr7", "chr1", "chr1", "chr7"), column16 = c(10060102L,
18054785L, 18054638L, 18052645L, 18054785L)), class = "data.frame", row.names = c(NA,
-5L))