Issue
I want to conduct a chi-square test on my data to explore if there are statistical differences between whistle types (A-F)
amongst countries 'France' and 'Germany'
(see the reproducible data frame below).
I would like to produce a table exactly the same as the Whistle Type Count table
(see below) but instead of the counts of observations per whistle type, I would like to calculate the mean of each whistle type(A-F) per country using either the packages dplyr
or data.table
. Both the vectors Country
and Whistle Type
are factors.
For this analysis, I cannot figure out how to calculate the mean for each whistle type per country.
If anyone is able to help, I would like to thank you in advance.
Whistle Type Count Table
library(dplyr)
#Count the number of Whistle types per country
Count_Whistle.type <- Whistle_Parameters %>% dplyr::count(Whistle_Type, Country, sort=TRUE)
Counts of Observations per whistle type
Whistle_Type Country n
1 F Germany 38
2 E France 37
3 B France 33
4 C Germany 33
5 B Germany 31
6 A France 27
7 F France 27
8 A Germany 25
9 D France 23
10 C France 21
11 D Germany 21
12 E Germany 19
What I need is something like this (these mean values are made up) showing the mean for each whistle type per country
Whistle_Type Country Mean
1 F Germany 14.9
2 E France 12.4
3 B France 9.6
Reproducible R Code
#Dummy data
#Create a cluster column with dummy data (clusters = 3)
f1 <- gl(n = 2, k=167.5); f1
#Produce a data frame for the dummy level data
f2<-as.data.frame(f1)
#Rename the column f2
colnames(f2)<-"Country"
#How many rows
nrow(f2)
#Rename the levels of the dependent variable 'Country' as classifiers
#prefer the inputs to be factors
levels(f2$Country) <- c("France", "Germany")
#Add a vector called Whistle Types
#Add a vector called Behaviors
Whistle_Types<-sample(c('A', 'B', 'C', 'D',
'E', 'F'), 335, replace=TRUE)
#Create random numbers
Start.Freq<-runif(335, min=1.195110e 02, max=23306.000000)
End.Freq<-runif(335, min=3.750000e 02, max=65310.000000)
Delta.Time<-runif(335, min=2.192504e-02, max=3.155762)
Low.Freq<-runif(335, min=6.592500e 02, max=20491.803000)
High.Freq<-runif(335, min=2.051000e 03, max=36388.450000)
Peak.Freq<-runif(335, min=7.324220 02, max=35595.703000)
Center.Freq<-runif(335, min=2.190000e-02, max=3.155800)
Delta.Freq<-runif(335, min=1.171875 03, max=30761.719000)
Delta.Time<-runif(335, min=2.192504e-02, max=3.155762)
#Bind the columns together
Bind<-cbind(f2, Start.Freq, End.Freq, Low.Freq, High.Freq, Peak.Freq, Center.Freq, Delta.Freq, Delta.Time, Whistle_Types)
#Rename the columns
colnames(Bind)<-c('Country', 'Low.Freq', 'High.Freq', 'Start.Freq', 'End.Freq', 'Peak.Freq', 'Center.Freq',
'Delta.Freq', 'Delta.Time',"Whistle_Type")
#Produce a dataframe
Whistle_Parameters<-as.data.frame(Bind)
Whistle_Parameters
CodePudding user response:
With data.table
:
setDT(Whistle_Parameters)
# vector of variable to average
var_to_mean <- setdiff(names(Whistle_Parameters),c("Country","Whistle_Type"))
# mean per Country and Whistle_type, for each column
Whistle_Parameters[,lapply(.SD,mean),.SDcols = var_to_mean,by = .(Country,Whistle_Type)]
Country Whistle_Type Low.Freq High.Freq Start.Freq End.Freq Peak.Freq Center.Freq Delta.Freq Delta.Time
1: France B 12518.44 31489.32 11258.852 20716.42 17208.48 1.527760 12618.33 1.851037
2: France C 10843.06 36625.38 10069.604 18183.59 18561.31 1.628858 15495.80 1.485466
3: France F 11232.58 31955.38 10833.592 20992.41 19007.11 1.562517 17165.26 1.729563
4: France E 10418.82 34813.67 11325.300 15257.26 17749.05 1.614468 18908.37 1.549006
5: France D 14526.32 29232.38 10738.944 16636.51 16756.75 2.004199 14306.67 1.608868
6: France A 14378.98 37006.09 10261.817 16479.82 18099.49 1.467998 12310.01 1.538442
7: Germany C 10866.43 40694.70 10420.398 16017.48 17431.95 1.721555 14245.38 1.195733
8: Germany E 14830.39 38750.77 10313.795 19415.02 20879.32 1.808080 17165.47 1.207900
9: Germany F 11270.00 27803.95 10173.037 19908.47 18155.31 1.375614 13727.61 1.528704
10: Germany B 12186.73 30519.04 9743.094 17460.80 19031.94 1.700328 17530.92 1.782998
11: Germany A 11639.71 34723.41 10769.073 18420.37 13990.86 1.485000 15683.72 1.488470
12: Germany D 12461.53 41251.08 8739.983 21661.02 15025.17 1.515318 12918.67 1.516498
With dplyr
:
library(dplyr)
Whistle_Parameters %>%
group_by(Country,Whistle_Type) %>%
summarise(across(var_to_mean,mean))
`summarise()` has grouped output by 'Country'. You can override using the `.groups` argument.
# A tibble: 12 x 10
# Groups: Country [2]
Country Whistle_Type Low.Freq High.Freq Start.Freq End.Freq Peak.Freq Cente~1 Delta~2 Delta~3
<fct> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 France A 14379. 37006. 10262. 16480. 18099. 1.47 12310. 1.54
2 France B 12518. 31489. 11259. 20716. 17208. 1.53 12618. 1.85
3 France C 10843. 36625. 10070. 18184. 18561. 1.63 15496. 1.49
4 France D 14526. 29232. 10739. 16637. 16757. 2.00 14307. 1.61
5 France E 10419. 34814. 11325. 15257. 17749. 1.61 18908. 1.55
6 France F 11233. 31955. 10834. 20992. 19007. 1.56 17165. 1.73
7 Germany A 11640. 34723. 10769. 18420. 13991. 1.49 15684. 1.49
8 Germany B 12187. 30519. 9743. 17461. 19032. 1.70 17531. 1.78
9 Germany C 10866. 40695. 10420. 16017. 17432. 1.72 14245. 1.20
10 Germany D 12462. 41251. 8740. 21661. 15025. 1.52 12919. 1.52
11 Germany E 14830. 38751. 10314. 19415. 20879. 1.81 17165. 1.21
12 Germany F 11270. 27804. 10173. 19908. 18155. 1.38 13728. 1.53
CodePudding user response:
library(data.table)
setDT(Whistle_Parameters)[, lapply(.SD, mean, na.rm=T), .(Country, Whistle_Type)]
OR
library(dplyr)
Whistle_Parameters %>%
group_by(Country, Whistle_Type) %>%
summarize(across(everything(), mean))