I looking for a way to transform my data frame into the format shown in out
below. Basically, for each species I will show how frequent (Freq
) it occurs within each group
, frequency is calculated as ratio of samples with count <0 and samples with count >0. If I have 10 samples and 3 samples have counts >0, the ratio is 0.3. In addition I would like a column with the absolute number of samples with count >0.
I have played around with dplyr::mutate
which I believe should work.
> df
sample1 sample2 sample3 sample4
Species1 2 12 52 221
Species2 0 13 0 0
Species3 5 0 0 25
Species4 0 0 0 0
Group Gr1 Gr1 Gr2 Gr2
> dput(df)
structure(list(sample1 = c("2", "0", "5", "0", "Gr1"), sample2 = c("12",
"13", "0", "0", "Gr1"), sample3 = c("52", "0", "0", "0", "Gr2"
), sample4 = c("221", "0", "25", "0", "Gr2")), class = "data.frame", row.names = c("Species1",
"Species2", "Species3", "Species4", "Group"))
out
Species Group Freq Absolute
Species1 Gr1 1 2
Species1 Gr2 1 2
Species2 Gr1 0.5 1
Species2 Gr2 0 0
Species3 Gr1 0.5 1
Species3 Gr2 0.5 1
Species4 Gr1 0 0
Species4 Gr2 0 0
CodePudding user response:
I am not sure I am understanding your question but here's some code which reshapes the data into a format that will be easier to work with.
library(tidyverse)
df %>%
rownames_to_column("Species") %>%
pivot_longer(-Species) %>%
group_by(name) %>%
mutate(Group = last(value)) %>%
filter(Species != "Group") %>%
mutate(value = as.numeric(value)) %>%
ungroup()
That produces:
# A tibble: 16 x 4
Species name value Group
<chr> <chr> <dbl> <chr>
1 Species1 sample1 2 Gr1
2 Species1 sample2 12 Gr1
3 Species1 sample3 52 Gr2
4 Species1 sample4 221 Gr2
5 Species2 sample1 0 Gr1
6 Species2 sample2 13 Gr1
7 Species2 sample3 0 Gr2
8 Species2 sample4 0 Gr2
9 Species3 sample1 5 Gr1
10 Species3 sample2 0 Gr1
11 Species3 sample3 0 Gr2
12 Species3 sample4 25 Gr2
13 Species4 sample1 0 Gr1
14 Species4 sample2 0 Gr1
15 Species4 sample3 0 Gr2
16 Species4 sample4 0 Gr2
Can you describe more about the logic you want to apply? I can coincidentally get the output you describe with the code below, but I'm not sure it aligns with how you want it to work. Is the "/2" supposed to change based on the number of Groups
in the data?
[code above] %>%
count(Species, Group, wt = value > 0, name = "Absolute") %>%
mutate(Freq = Absolute / 2)
# A tibble: 8 x 4
Species Group Absolute Freq
<chr> <chr> <int> <dbl>
1 Species1 Gr1 2 1
2 Species1 Gr2 2 1
3 Species2 Gr1 1 0.5
4 Species2 Gr2 0 0
5 Species3 Gr1 1 0.5
6 Species3 Gr2 1 0.5
7 Species4 Gr1 0 0
8 Species4 Gr2 0 0
CodePudding user response:
The problem here is that although df
is technically a data frame, it is not well structured. A data frame should have one column per variable and one row per observation. Your data would make more sense if it were transposed first:
library(tibble)
library(dplyr)
df <- rownames_to_column(as.data.frame(t(df)), "sample")
df[2:5] <- lapply(df[2:5], as.numeric)
df
#> sample Species1 Species2 Species3 Species4 Group
#> 1 sample1 2 0 5 0 Gr1
#> 2 sample2 12 13 0 0 Gr1
#> 3 sample3 52 0 0 0 Gr2
#> 4 sample4 221 0 25 0 Gr2
Now we can pivot to make the Species
its own column, and it is straightforward to do the calculations you need:
tidyr::pivot_longer(df, 2:5) %>%
group_by(name, Group) %>%
summarise(absolute = sum(value > 0),
Freq = absolute / length(name))
#> # A tibble: 8 x 4
#> # Groups: name [4]
#> name Group absolute Freq
#> <chr> <chr> <int> <dbl>
#> 1 Species1 Gr1 2 1
#> 2 Species1 Gr2 2 1
#> 3 Species2 Gr1 1 0.5
#> 4 Species2 Gr2 0 0
#> 5 Species3 Gr1 1 0.5
#> 6 Species3 Gr2 1 0.5
#> 7 Species4 Gr1 0 0
#> 8 Species4 Gr2 0 0
CodePudding user response:
An option with data.table
library(data.table)
melt(type.convert(data.table::transpose(setDT(df,
keep.rownames = TRUE), make.names = 'rn'), as.is = TRUE),
id.var = 'Group', variable.name = 'Species')[,
.(Absolute = sum(value > 0)), .(Group, Species)][, Freq := Absolute/2][]
Group Species Absolute Freq
1: Gr1 Species1 2 1.0
2: Gr2 Species1 2 1.0
3: Gr1 Species2 1 0.5
4: Gr2 Species2 0 0.0
5: Gr1 Species3 1 0.5
6: Gr2 Species3 1 0.5
7: Gr1 Species4 0 0.0
8: Gr2 Species4 0 0.0