Home > database >  dplyr/data.table: How to calculate the mean for counts of observations per group for two vectors con
dplyr/data.table: How to calculate the mean for counts of observations per group for two vectors con

Time:10-06

Issue

I want to conduct a chi-square test on my data to explore if there are statistical differences between whistle types (A-F) amongst countries 'France' and 'Germany' (see the reproducible data frame below).

I would like to produce a table exactly the same as the Whistle Type Count table (see below) but instead of the counts of observations per whistle type, I would like to calculate the mean of each whistle type(A-F) per country using either the packages dplyr or data.table. Both the vectors Country and Whistle Type are factors.

For this analysis, I cannot figure out how to calculate the mean for each whistle type per country.

If anyone is able to help, I would like to thank you in advance.

Whistle Type Count Table

library(dplyr)

#Count the number of Whistle types per country
Count_Whistle.type <- Whistle_Parameters %>% dplyr::count(Whistle_Type, Country, sort=TRUE)

Counts of Observations per whistle type 

   Whistle_Type Country  n
1             F Germany 38
2             E  France 37
3             B  France 33
4             C Germany 33
5             B Germany 31
6             A  France 27
7             F  France 27
8             A Germany 25
9             D  France 23
10            C  France 21
11            D Germany 21
12            E Germany 19

What I need is something like this (these mean values are made up) showing the mean for each whistle type per country

   Whistle_Type Country  Mean
1             F Germany 14.9
2             E  France 12.4
3             B  France  9.6

Reproducible R Code

#Dummy data
#Create a cluster column with dummy data (clusters = 3)
f1 <- gl(n = 2, k=167.5); f1

#Produce a data frame for the dummy level data
f2<-as.data.frame(f1)

#Rename the column f2
colnames(f2)<-"Country"

#How many rows
nrow(f2)

#Rename the levels of the dependent variable 'Country' as classifiers
#prefer the inputs to be factors
levels(f2$Country) <- c("France", "Germany")

#Add a vector called Whistle Types
#Add a vector called Behaviors
Whistle_Types<-sample(c('A', 'B', 'C', 'D',
                     'E', 'F'), 335, replace=TRUE)

#Create random numbers
Start.Freq<-runif(335, min=1.195110e 02, max=23306.000000)
End.Freq<-runif(335, min=3.750000e 02, max=65310.000000)
Delta.Time<-runif(335, min=2.192504e-02, max=3.155762)
Low.Freq<-runif(335, min=6.592500e 02, max=20491.803000)
High.Freq<-runif(335, min=2.051000e 03, max=36388.450000)
Peak.Freq<-runif(335, min=7.324220 02, max=35595.703000)
Center.Freq<-runif(335, min=2.190000e-02, max=3.155800)
Delta.Freq<-runif(335, min=1.171875 03, max=30761.719000)
Delta.Time<-runif(335, min=2.192504e-02, max=3.155762)

#Bind the columns together
Bind<-cbind(f2, Start.Freq, End.Freq,  Low.Freq, High.Freq, Peak.Freq,  Center.Freq, Delta.Freq, Delta.Time, Whistle_Types)

#Rename the columns 
colnames(Bind)<-c('Country', 'Low.Freq', 'High.Freq', 'Start.Freq', 'End.Freq', 'Peak.Freq', 'Center.Freq', 
                  'Delta.Freq', 'Delta.Time',"Whistle_Type")

#Produce a dataframe
Whistle_Parameters<-as.data.frame(Bind)
Whistle_Parameters

CodePudding user response:

With data.table:

setDT(Whistle_Parameters)
# vector of variable to average
var_to_mean <- setdiff(names(Whistle_Parameters),c("Country","Whistle_Type"))
# mean per Country and Whistle_type, for each column
Whistle_Parameters[,lapply(.SD,mean),.SDcols = var_to_mean,by = .(Country,Whistle_Type)]

   Country Whistle_Type Low.Freq High.Freq Start.Freq End.Freq Peak.Freq Center.Freq Delta.Freq Delta.Time
 1:  France            B 12518.44  31489.32  11258.852 20716.42  17208.48    1.527760   12618.33   1.851037
 2:  France            C 10843.06  36625.38  10069.604 18183.59  18561.31    1.628858   15495.80   1.485466
 3:  France            F 11232.58  31955.38  10833.592 20992.41  19007.11    1.562517   17165.26   1.729563
 4:  France            E 10418.82  34813.67  11325.300 15257.26  17749.05    1.614468   18908.37   1.549006
 5:  France            D 14526.32  29232.38  10738.944 16636.51  16756.75    2.004199   14306.67   1.608868
 6:  France            A 14378.98  37006.09  10261.817 16479.82  18099.49    1.467998   12310.01   1.538442
 7: Germany            C 10866.43  40694.70  10420.398 16017.48  17431.95    1.721555   14245.38   1.195733
 8: Germany            E 14830.39  38750.77  10313.795 19415.02  20879.32    1.808080   17165.47   1.207900
 9: Germany            F 11270.00  27803.95  10173.037 19908.47  18155.31    1.375614   13727.61   1.528704
10: Germany            B 12186.73  30519.04   9743.094 17460.80  19031.94    1.700328   17530.92   1.782998
11: Germany            A 11639.71  34723.41  10769.073 18420.37  13990.86    1.485000   15683.72   1.488470
12: Germany            D 12461.53  41251.08   8739.983 21661.02  15025.17    1.515318   12918.67   1.516498

With dplyr:


library(dplyr)

Whistle_Parameters %>%
  group_by(Country,Whistle_Type) %>%
  summarise(across(var_to_mean,mean))

`summarise()` has grouped output by 'Country'. You can override using the `.groups` argument.
# A tibble: 12 x 10
# Groups:   Country [2]
   Country Whistle_Type Low.Freq High.Freq Start.Freq End.Freq Peak.Freq Cente~1 Delta~2 Delta~3
   <fct>   <chr>           <dbl>     <dbl>      <dbl>    <dbl>     <dbl>   <dbl>   <dbl>   <dbl>
 1 France  A              14379.    37006.     10262.   16480.    18099.    1.47  12310.    1.54
 2 France  B              12518.    31489.     11259.   20716.    17208.    1.53  12618.    1.85
 3 France  C              10843.    36625.     10070.   18184.    18561.    1.63  15496.    1.49
 4 France  D              14526.    29232.     10739.   16637.    16757.    2.00  14307.    1.61
 5 France  E              10419.    34814.     11325.   15257.    17749.    1.61  18908.    1.55
 6 France  F              11233.    31955.     10834.   20992.    19007.    1.56  17165.    1.73
 7 Germany A              11640.    34723.     10769.   18420.    13991.    1.49  15684.    1.49
 8 Germany B              12187.    30519.      9743.   17461.    19032.    1.70  17531.    1.78
 9 Germany C              10866.    40695.     10420.   16017.    17432.    1.72  14245.    1.20
10 Germany D              12462.    41251.      8740.   21661.    15025.    1.52  12919.    1.52
11 Germany E              14830.    38751.     10314.   19415.    20879.    1.81  17165.    1.21
12 Germany F              11270.    27804.     10173.   19908.    18155.    1.38  13728.    1.53

CodePudding user response:

library(data.table)
setDT(Whistle_Parameters)[, lapply(.SD, mean, na.rm=T), .(Country, Whistle_Type)]

OR

library(dplyr)
Whistle_Parameters %>% 
  group_by(Country, Whistle_Type) %>% 
  summarize(across(everything(), mean))
  • Related