Calculate frequency for group-rows within data frame-CodePudding

I looking for a way to transform my data frame into the format shown in out below. Basically, for each species I will show how frequent (Freq) it occurs within each group, frequency is calculated as ratio of samples with count <0 and samples with count >0. If I have 10 samples and 3 samples have counts >0, the ratio is 0.3. In addition I would like a column with the absolute number of samples with count >0.

I have played around with dplyr::mutate which I believe should work.

> df
         sample1 sample2 sample3 sample4
Species1       2      12      52     221
Species2       0      13       0       0
Species3       5       0       0      25
Species4       0       0       0       0
Group        Gr1     Gr1     Gr2     Gr2


> dput(df)
structure(list(sample1 = c("2", "0", "5", "0", "Gr1"), sample2 = c("12", 
"13", "0", "0", "Gr1"), sample3 = c("52", "0", "0", "0", "Gr2"
), sample4 = c("221", "0", "25", "0", "Gr2")), class = "data.frame", row.names = c("Species1", 
"Species2", "Species3", "Species4", "Group"))





 out

Species Group Freq Absolute
Species1 Gr1 1 2
Species1 Gr2 1 2
Species2 Gr1 0.5 1
Species2 Gr2 0 0
Species3 Gr1 0.5 1
Species3 Gr2 0.5 1
Species4 Gr1 0 0
Species4 Gr2 0 0

CodePudding user response：

I am not sure I am understanding your question but here's some code which reshapes the data into a format that will be easier to work with.

library(tidyverse)
df %>%
  rownames_to_column("Species") %>%
  pivot_longer(-Species) %>%
  group_by(name) %>%
  mutate(Group = last(value)) %>%
  filter(Species != "Group") %>%
  mutate(value = as.numeric(value)) %>%
  ungroup()

That produces:

# A tibble: 16 x 4
   Species  name    value Group
   <chr>    <chr>   <dbl> <chr>
 1 Species1 sample1     2 Gr1  
 2 Species1 sample2    12 Gr1  
 3 Species1 sample3    52 Gr2  
 4 Species1 sample4   221 Gr2  
 5 Species2 sample1     0 Gr1  
 6 Species2 sample2    13 Gr1  
 7 Species2 sample3     0 Gr2  
 8 Species2 sample4     0 Gr2  
 9 Species3 sample1     5 Gr1  
10 Species3 sample2     0 Gr1  
11 Species3 sample3     0 Gr2  
12 Species3 sample4    25 Gr2  
13 Species4 sample1     0 Gr1  
14 Species4 sample2     0 Gr1  
15 Species4 sample3     0 Gr2  
16 Species4 sample4     0 Gr2

Can you describe more about the logic you want to apply? I can coincidentally get the output you describe with the code below, but I'm not sure it aligns with how you want it to work. Is the "/2" supposed to change based on the number of Groups in the data?

[code above] %>% 
  count(Species, Group, wt = value > 0, name = "Absolute") %>%
  mutate(Freq = Absolute / 2)


# A tibble: 8 x 4
  Species  Group Absolute  Freq
  <chr>    <chr>    <int> <dbl>
1 Species1 Gr1          2   1  
2 Species1 Gr2          2   1  
3 Species2 Gr1          1   0.5
4 Species2 Gr2          0   0  
5 Species3 Gr1          1   0.5
6 Species3 Gr2          1   0.5
7 Species4 Gr1          0   0  
8 Species4 Gr2          0   0

CodePudding user response：

The problem here is that although df is technically a data frame, it is not well structured. A data frame should have one column per variable and one row per observation. Your data would make more sense if it were transposed first:

library(tibble)
library(dplyr)

df <- rownames_to_column(as.data.frame(t(df)), "sample")

df[2:5] <- lapply(df[2:5], as.numeric)

df

#>    sample Species1 Species2 Species3 Species4 Group
#> 1 sample1        2        0        5        0   Gr1
#> 2 sample2       12       13        0        0   Gr1
#> 3 sample3       52        0        0        0   Gr2
#> 4 sample4      221        0       25        0   Gr2

Now we can pivot to make the Species its own column, and it is straightforward to do the calculations you need:

tidyr::pivot_longer(df, 2:5) %>%
  group_by(name, Group) %>%
  summarise(absolute = sum(value > 0),
            Freq = absolute / length(name))

#> # A tibble: 8 x 4
#> # Groups:   name [4]
#>   name     Group absolute  Freq
#>   <chr>    <chr>    <int> <dbl>
#> 1 Species1 Gr1          2   1  
#> 2 Species1 Gr2          2   1  
#> 3 Species2 Gr1          1   0.5
#> 4 Species2 Gr2          0   0  
#> 5 Species3 Gr1          1   0.5
#> 6 Species3 Gr2          1   0.5
#> 7 Species4 Gr1          0   0  
#> 8 Species4 Gr2          0   0

CodePudding user response：

An option with data.table

library(data.table)
melt(type.convert(data.table::transpose(setDT(df, 
   keep.rownames = TRUE), make.names = 'rn'), as.is = TRUE),
    id.var = 'Group', variable.name = 'Species')[, 
   .(Absolute = sum(value > 0)), .(Group, Species)][, Freq := Absolute/2][]
   Group  Species Absolute Freq
1:   Gr1 Species1        2  1.0
2:   Gr2 Species1        2  1.0
3:   Gr1 Species2        1  0.5
4:   Gr2 Species2        0  0.0
5:   Gr1 Species3        1  0.5
6:   Gr2 Species3        1  0.5
7:   Gr1 Species4        0  0.0
8:   Gr2 Species4        0  0.0