Good night everyone: First of all, thank you for your time and for your effort answering this question. I have the following data:
aaa aac aag aat aca acc acg act aga agc agg agt ata atc atg att caa cac cag cat cca ccc ccg cct
1501 895 660 989 759 858 658 524 667 878 608 621 897 983 901 1030 669 713 1381 708 610 514 1308 542
cga cgc cgg cgt cta ctc ctg ctt gaa gac gag gat gca gcc gcg gct gga ggc ggg ggt gta gtc gtg gtt
516 888 1398 664 196 410 1371 523 1071 624 519 1118 1045 881 1007 835 894 1037 734 915 618 660 862 863
taa tac tag tat tca tcc tcg tct tga tgc tgg tgt tta ttc ttg ttt
805 567 214 996 1057 721 493 599 1255 965 840 803 871 816 729 1364
From here I have created the following table in R:
aa ac ag at ca cc cg ct ga gc gg gt ta tc tg tt
a 1501 895 660 989 759 858 658 524 667 878 608 621 897 983 901 1030
c 669 713 1381 708 610 514 1308 542 516 888 1398 664 196 410 1371 523
g 1071 624 519 1118 1045 881 1007 835 894 1037 734 915 618 660 862 863
t 805 567 214 996 1057 721 493 599 1255 965 840 803 871 816 729 1364
The code used is this:
triple <- count(morseq,wordsize=3)
triple
R <- matrix(triple,4,16,byrow=T)
colnames(R) <- c("aa","ac","ag","at","ca","cc","cg","ct","ga","gc","gg","gt","ta","tc","tg","tt") ## asigna nombre a las columnas
row.names(R) <- c("a","c","g","t")
I want to simplify the table above by converting it to a 4x4 table where the columns represent the third letter and the rows represents the first letter. Therefore, I have to sum the values where the first and final letter are the same from the 4x16 table and I don't know how to start doing it.
The final result should be something like this:
a c g t
a x x x x
c x x x x
g x x x x
t x x x x
Sinceresly,
CodePudding user response:
One option would be to first reshape to long using tidyr::pivot_longer
and to afterwards use summarise
to compute the sum:
library(tidyr)
library(dplyr)
library(tibble)
pivot_longer(R, everything(), names_to = c(".value", "name"), names_pattern = "(.)(.)") |>
group_by(name) |>
summarise(across(everything(), sum)) |>
column_to_rownames("name")
#> a c g t
#> a 4046 3471 3332 2582
#> c 2799 2974 3768 2869
#> g 2774 3466 3580 3863
#> t 3811 2500 3003 3780
DATA
R <- structure(list(aa = c(1501L, 669L, 1071L, 805L), ac = c(
895L,
713L, 624L, 567L
), ag = c(660L, 1381L, 519L, 214L), at = c(
989L,
708L, 1118L, 996L
), ca = c(759L, 610L, 1045L, 1057L), cc = c(
858L,
514L, 881L, 721L
), cg = c(658L, 1308L, 1007L, 493L), ct = c(
524L,
542L, 835L, 599L
), ga = c(667L, 516L, 894L, 1255L), gc = c(
878L,
888L, 1037L, 965L
), gg = c(608L, 1398L, 734L, 840L), gt = c(
621L,
664L, 915L, 803L
), ta = c(897L, 196L, 618L, 871L), tc = c(
983L,
410L, 660L, 816L
), tg = c(901L, 1371L, 862L, 729L), tt = c(
1030L,
523L, 863L, 1364L
)), class = "data.frame", row.names = c(
"a",
"c", "g", "t"
))
CodePudding user response:
Assuming that you have a table
rather than a data frame, you can do:
library(tidyverse)
tab %>%
as.data.frame() %>%
mutate(Var2 = substr(Var2, 2, 2)) %>%
group_by(Var1, Var2) %>%
summarise(Freq = sum(Freq), .groups = "drop") %>%
pivot_wider(names_from = Var2, values_from = Freq) %>%
column_to_rownames("Var1")
#> a c g t
#> a 3824 3614 2827 3164
#> c 1991 2525 5458 2437
#> g 3628 3202 3122 3731
#> t 3988 3069 2276 3762
Data
tab <- structure(c(1501L, 669L, 1071L, 805L, 895L, 713L, 624L, 567L,
660L, 1381L, 519L, 214L, 989L, 708L, 1118L, 996L, 759L, 610L,
1045L, 1057L, 858L, 514L, 881L, 721L, 658L, 1308L, 1007L, 493L,
524L, 542L, 835L, 599L, 667L, 516L, 894L, 1255L, 878L, 888L,
1037L, 965L, 608L, 1398L, 734L, 840L, 621L, 664L, 915L, 803L,
897L, 196L, 618L, 871L, 983L, 410L, 660L, 816L, 901L, 1371L,
862L, 729L, 1030L, 523L, 863L, 1364L), dim = c(4L, 16L), dimnames = list(
c("a", "c", "g", "t"), c("aa", "ac", "ag", "at", "ca", "cc",
"cg", "ct", "ga", "gc", "gg", "gt", "ta", "tc", "tg", "tt"
)), class = "table")
tab
#> aa ac ag at ca cc cg ct ga gc gg gt ta tc tg tt
#> a 1501 895 660 989 759 858 658 524 667 878 608 621 897 983 901 1030
#> c 669 713 1381 708 610 514 1308 542 516 888 1398 664 196 410 1371 523
#> g 1071 624 519 1118 1045 881 1007 835 894 1037 734 915 618 660 862 863
#> t 805 567 214 996 1057 721 493 599 1255 965 840 803 871 816 729 1364
Created on 2022-10-12 with reprex v2.0.2