Add frequency into dataframe for each group and unique element (R)-CodePudding

I have a table such as

Group Family Nb 
1     A      15
2     B      20
3     A      2
3     B      1
3     C      1
4     D      10
4     A      5
5     B      1
5     D      1

And I would like to transform that dataframe such that I have each unique Family element in columns, and for each Group, the frequency of the Nb element, I should then get :

 Group  A    B    C    D    E  F
 1      1    0    0    0    0  0
 2      0    1    0    0    0  0
 3      0.5  0.25 0.25 0    0  0
 4      0.33 0    0    0.67 0  0
 5      0    0.5  0    0.5  0  0

Here is the dput format of the tabel if it can helps :

Family = c("A", "B", "A", "B", "C", "D", "A", "B", "D"), 
    Nb = c(15L, 20L, 2L, 1L, 1L, 10L, 5L, 1L, 1L)), class = "data.frame", row.names = c(NA, 
-9L))

CodePudding user response：

in Base R:

 prop.table(xtabs(Nb ~ ., df), 1)

#     Family
#Group         A         B         C         D
#    1 1.0000000 0.0000000 0.0000000 0.0000000
#    2 0.0000000 1.0000000 0.0000000 0.0000000
#    3 0.5000000 0.2500000 0.2500000 0.0000000
#    4 0.3333333 0.0000000 0.0000000 0.6666667
#    5 0.0000000 0.5000000 0.0000000 0.5000000

If you need it as a data.frame, just wrap the results in as.data.frame.matrix

CodePudding user response：

You can first group_by the Group column, then calculate the frequency and finally pivot the data into a "wide" format.

library(tidyverse)

df %>% 
  group_by(Group) %>% 
  mutate(Nb = Nb/sum(Nb)) %>% 
  pivot_wider(Group, names_from = "Family", values_from = "Nb", values_fill = 0)

# A tibble: 5 × 5
# Groups:   Group [5]
  Group     A     B     C     D
  <int> <dbl> <dbl> <dbl> <dbl>
1     1 1      0     0    0    
2     2 0      1     0    0    
3     3 0.5    0.25  0.25 0    
4     4 0.333  0     0    0.667
5     5 0      0.5   0    0.5

CodePudding user response：

Another possible solution:

library(tidyverse)

df %>% 
  pivot_wider(names_from = Family, values_from = Nb, values_fill = 0) %>% 
  mutate(aux = rowSums(.[-1]), across(-Group, ~ .x / aux), aux = NULL)

#> # A tibble: 5 × 5
#>   Group     A     B     C     D
#>   <int> <dbl> <dbl> <dbl> <dbl>
#> 1     1 1      0     0    0    
#> 2     2 0      1     0    0    
#> 3     3 0.5    0.25  0.25 0    
#> 4     4 0.333  0     0    0.667
#> 5     5 0      0.5   0    0.5