The mtcars dataset contains the variable "carb" with the number of carburetors. First I want to find out how many cars have 1, 2, 3, etc. carburetors. I used the dplyr verb count().
library(dplyr)
df <- mtcars
N <- df %>%
count(carb)
which results in:
> N
carb n
1 1 7
2 2 10
3 3 3
4 4 10
5 6 1
6 8 1
Then I want to know, how many cars with 1 carb, with 2 carbs, witch 3 etc. have either 4, 6, or 8 cylinders.
For example: I used filter() to find out the total number of cars with 1 carb and 4 cylinders by using:
carb1cyl4 <- df %>%
filter(carb == 1, cyl == 4) %>%
count() %>%
rename(carb1cyl4 = n)
which results in:
carb1cyl4
1 5
I did the same for 6 and 8 cylinders with following results:
carb1cyl6
1 2
carb1cyl8
1 0
If I continue this for all carbs, I could do some _rows and _cols binding and then calculate the percentage of cars with a certain number of carbs and cyls by using mutate(carbXcylX / N), so basically dividing the amount of cars for each carb / cyl combination by the amount of cars with the corresponding number of carbs.
Problem is, my dataset is much much larger and it would take ages plus make it vulnerable to mistakes, if I would continue this route. Is there another way to calculate this?
A glimpse of the final outcome should look like this.
carb n perc1cy4 perc1cy6 perc1cy8
1 1 7 0.7142857 0.2857143 0
Thank you in advance!
CodePudding user response:
Using table:
cbind(n = table(mtcars$carb),
prop.table(with(mtcars, table(carb, cyl)), margin = 1))
# n 4 6 8
# 1 7 0.7142857 0.2857143 0.0
# 2 10 0.6000000 0.0000000 0.4
# 3 3 0.0000000 0.0000000 1.0
# 4 10 0.0000000 0.4000000 0.6
# 6 1 0.0000000 1.0000000 0.0
# 8 1 0.0000000 0.0000000 1.0
CodePudding user response:
What I'd probably suggest is making a group size column with something like
count_df <- df %>% count(carb, cyl) %>% rename(n = group_size)
Then you can inner join that to the table
inner_join(df, count_df, by = c("carb", "cyl")
Then calculate percentage with
mutate(perc = (n/group_size) * 100)
CodePudding user response:
This can be made more succinct, but here's a starting point, using summarise
mtcars %>%
group_by(carb) %>%
summarise(n(),
sum(cyl == 4),
sum(cyl == 6),
sum(cyl == 8),
mean(cyl == 4),
mean(cyl == 6),
mean(cyl == 8))
#> # A tibble: 6 x 8
#> carb `n()` `sum(cyl == 4)` `sum(cyl == 6)` `sum(cyl == 8)` `mean(cyl == 4)` `mean(cyl == 6)` `mean(cyl == 8)`
#> <dbl> <int> <int> <int> <int> <dbl> <dbl> <dbl>
#> 1 1 7 5 2 0 0.714 0.286 0
#> 2 2 10 6 0 4 0.6 0 0.4
#> 3 3 3 0 0 3 0 0 1
#> 4 4 10 0 4 6 0 0.4 0.6
#> 5 6 1 0 1 0 0 1 0
#> 6 8 1 0 0 1 0 0 1