I have a dataframe:
df <- data.frame (ID = c(1:20),
Ethnicity = c(rep(c("White", "Asian", "Black", "Hispanic", "Other"), times=20/5)),
Age = c(1:20),
Set = rep(c(1,2,3,4), times=20/4)
)
I want to know the ethnicity
and age
breakdown by Set
. I usually use table(df$ethnicity)
, but how do I do this by Set
?
The desired output for ethnicity is a table with the percentage of each ethnicity
by Set
. For example, in this case, all sets will have 20% White, 20% Asian, 20% Black, 20% Hispanic, 20% Other. As for age, it will output the mean age of each set in a table.
Thank you!
CodePudding user response:
You can use prop.table
:
prop.table(table(df$Ethnicity, df$Set), 2)
1 2 3 4
Asian 0.2 0.2 0.2 0.2
Black 0.2 0.2 0.2 0.2
Hispanic 0.2 0.2 0.2 0.2
Other 0.2 0.2 0.2 0.2
White 0.2 0.2 0.2 0.2
For numeric x categorical, you can use by
:
by(df$Age, df$Ethnicity, mean)
df$Ethnicity: Asian
[1] 9.5
-------------------------------------------------------------
df$Ethnicity: Black
[1] 10.5
-------------------------------------------------------------
df$Ethnicity: Hispanic
[1] 11.5
-------------------------------------------------------------
df$Ethnicity: Other
[1] 12.5
-------------------------------------------------------------
df$Ethnicity: White
[1] 8.5