I'm trying to calculate a proportion of observed counts for a pearson's chi square test.
I have a data frame that has a column of categories ("x"), and a column of frequencies ("freq"), which I am going to divide by the total number of samples to get a proportion.
However, when I do this I end up with a range of values that are not in the same category order (the categories don't appear in the output) as they should be based on the dataframe.
Just wondering if there is an argument I can add to the command when I'm calculating the proportion which will give me an output where each proportion value is put under the right category (and the categories appear in the output)
Thanks
EDIT
To clarify my issue further this is the code I used to get observed counts from my data (essentially the frequency of samples being from a category.
observed <- count(category)
enter image description here
What I want to do is take these observed frequencies and make them into proportions (dividing observed frequencies by total no. of samples).
I tried to do that using this code
propobserved <- observed$freq/total
total being the total no. of samples (90).
which gave me this output: propobserved 1 0.02222222 0.55555556 0.01111111 0.13333333 0.06666667 0.21111111
This doesn't have categories attached, and additionally is not in the same order as in the photo (I worked these out by hand and the order is off
CodePudding user response:
This will give you an additional column with the proportion (so that the column with category will be maintained in the same data frame). Additionally, you can calculate the total in the same step.
observed$prop <- observed$freq/sum(observed$freq)