Issue
I have a list with three vectors where two are categorical (Whistle_Type and Country)
and one is numeric (counts of whistle types A-F)
(see below), which I produced using dplyr() with the count() function
(see below). I want to run a Chi-Square test
to determine if there are any significant differences between whistle types
among the countries Germany and France
I want to create a distribution table
using a function
showing the p-values to conduct a Chi-square test. I would like to produce something like this.
Desired Distribution table with p-values
A B C D E F
France p p p p p p
Germany p p p p p p
*p stands for p-values
I can't quite figure out how to manipulate the function to produce the outcome that I would like. I don't understand this error message as I am incorporating both a dataframe and list into the function
Error in model.frame.default(formula = as.formula(paste(x, " ~ Country")), :
'data' must be a data.frame, environment, or list
Called from: model.frame.default(formula = as.formula(paste(x, " ~ Country")),
data = Count.Whistle.type_ChiSq$n)
If anyone is able to help (see the reproducible data frame below), I would be deeply appreciative
R code
Produce a list showing counts of whistle types per country using the function count()
Count.Whistle.type_ChiSq <- Whistle_Parameters %>% dplyr::count(Whistle_Type, Country)
Count.Whistle.type_ChiSq
List of counts of whistle types per country
Whistle_Type Country n
1 A France 90
2 A Germany 70
3 B France 34
4 B Germany 10
5 C France 24
6 C Germany 9
7 D France 44
8 D Germany 25
9 E France 21
10 E Germany 39
11 F France 25
12 F Germany 32
Chi-Square function
#List of acoustic parameters to conduct a Chi-squre test
Outcomes_Whistle_Types<-c("A", "B","C", "D", "E", "F")
#Eliminate the duplicate rows present in the vector country
Country <- unique(Parameters$Country)
#Prodcue a distribution table with p-values for the Chi Square test
Chi_Whistle<-sapply(Outcomes_Whistle_Types, \(x) chisq.test(xtabs(as.formula(paste(x, ' ~ Country')), Count.Whistle.type_ChiSq$n))$p.value)
#Set the names for the columns and rows in the distribution table
chi_Country <- setNames(Chi_Whistle, Country)
#Chi-Square test
chi_Square_results<-lapply(chi_Country, chisq.test)
chi_Square_results
Many thanks in advance
Reproducible Dataframe
#Dummy data
#Create a cluster column with dummy data (clusters = 3)
f1 <- gl(n = 2, k=167.5); f1
#Produce a data frame for the dummy level data
f2<-as.data.frame(f1)
#Rename the column f2
colnames(f2)<-"Country"
#How many rows
nrow(f2)
#Rename the levels of the dependent variable 'Country' as classifiers
#prefer the inputs to be factors
levels(f2$Country) <- c("France", "Germany")
#Add a vector called Whistle Types
#Add a vector called Behaviors
Whistle_Types<-sample(c('A', 'B', 'C', 'D',
'E', 'F'), 335, replace=TRUE)
#Create random numbers
Start.Freq<-runif(335, min=1.195110e 02, max=23306.000000)
End.Freq<-runif(335, min=3.750000e 02, max=65310.000000)
Delta.Time<-runif(335, min=2.192504e-02, max=3.155762)
Low.Freq<-runif(335, min=6.592500e 02, max=20491.803000)
High.Freq<-runif(335, min=2.051000e 03, max=36388.450000)
Peak.Freq<-runif(335, min=7.324220 02, max=35595.703000)
Center.Freq<-runif(335, min=2.190000e-02, max=3.155800)
Delta.Freq<-runif(335, min=1.171875 03, max=30761.719000)
Delta.Time<-runif(335, min=2.192504e-02, max=3.155762)
#Bind the columns together
Bind<-cbind(f2, Start.Freq, End.Freq, Low.Freq, High.Freq, Peak.Freq, Center.Freq, Delta.Freq, Delta.Time, Whistle_Types)
#Rename the columns
colnames(Bind)<-c('Country', 'Low.Freq', 'High.Freq', 'Start.Freq', 'End.Freq', 'Peak.Freq', 'Center.Freq',
'Delta.Freq', 'Delta.Time',"Whistle_Type")
#Produce a dataframe
Whistle_Parameters<-as.data.frame(Bind)
CodePudding user response:
To be honest, I'm not sure about your desired output. What p-values do you want to show for each combination of country x whistle type?
We can easily calculate one p-value which tests the hypothesis whether there are difference in the distribution of whistle type by country.
This is similar to the first example in the docs of ?chisq.test()
.
For this we just need the Whistle_Parameters
data and we can use table()
to create a contingency table which we can then use as input for chisq.test()
.
We can find the first example of the docs in ?chisq.test()
in Agresti, A. (2007) on page 38.
freq_tbl <- table(Whistle_Parameters$Country, Whistle_Parameters$Whistle_Type)
freq_tbl
#>
#> A B C D E F
#> France 28 24 29 25 24 38
#> Germany 35 32 21 19 40 20
chisq.test(freq_tbl)
#>
#> Pearson's Chi-squared test
#>
#> data: freq_tbl
#> X-squared = 13.602, df = 5, p-value = 0.01834
The random data with set.seed()
set.seed(123)
#Dummy data
#Create a cluster column with dummy data (clusters = 3)
f1 <- gl(n = 2, k=167.5); f1
#> [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
#> [38] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
#> [75] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
#> [112] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
#> [149] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
#> [186] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
#> [223] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
#> [260] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
#> [297] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
#> [334] 2 1
#> Levels: 1 2
#Produce a data frame for the dummy level data
f2<-as.data.frame(f1)
#Rename the column f2
colnames(f2)<-"Country"
#How many rows
nrow(f2)
#> [1] 335
#Rename the levels of the dependent variable 'Country' as classifiers
#prefer the inputs to be factors
levels(f2$Country) <- c("France", "Germany")
#Add a vector called Whistle Types
#Add a vector called Behaviors
Whistle_Types<-sample(c('A', 'B', 'C', 'D',
'E', 'F'), 335, replace=TRUE)
#Create random numbers
Start.Freq<-runif(335, min=1.195110e 02, max=23306.000000)
End.Freq<-runif(335, min=3.750000e 02, max=65310.000000)
Delta.Time<-runif(335, min=2.192504e-02, max=3.155762)
Low.Freq<-runif(335, min=6.592500e 02, max=20491.803000)
High.Freq<-runif(335, min=2.051000e 03, max=36388.450000)
Peak.Freq<-runif(335, min=7.324220 02, max=35595.703000)
Center.Freq<-runif(335, min=2.190000e-02, max=3.155800)
Delta.Freq<-runif(335, min=1.171875 03, max=30761.719000)
Delta.Time<-runif(335, min=2.192504e-02, max=3.155762)
#Bind the columns together
Bind<-cbind(f2, Start.Freq, End.Freq, Low.Freq, High.Freq, Peak.Freq, Center.Freq, Delta.Freq, Delta.Time, Whistle_Types)
#Rename the columns
colnames(Bind)<-c('Country', 'Low.Freq', 'High.Freq', 'Start.Freq', 'End.Freq', 'Peak.Freq', 'Center.Freq',
'Delta.Freq', 'Delta.Time',"Whistle_Type")
#Produce a dataframe
Whistle_Parameters<-as.data.frame(Bind)
Created on 2022-10-06 by the reprex package (v2.0.1)