Home > OS >  R code for creating dataframes including unused categories?
R code for creating dataframes including unused categories?

Time:07-27

I am trying to make all my data frames in r have the same levels in a categorical column so that, when I make barplots of them all, they are comparable with some having "unused factors" of frequency 0.

Currently I have multiple, separate data frames including a global data frame, then several broken down by region. Each one has a category column, then a frequency column. I have one "global" data frame with all the categories, but each of the regional data frames only have counts of certain categories found there. For example...

Global DF

category frequency
red 2
orange 4
yellow 7
green 1
blue 4
purple 4

Current West Region DF

category frequency
orange 2
blue 1
purple 3

Desired West Region DF

category frequency
red 0
orange 2
yellow 0
green 0
blue 1
purple 3

This is all based on the original dataset which looks like:

Region Category
West orange
West orange
West blue
West purple
West purple
West purple
North red
North yellow
... ...

I'm currently using ddply to create the regional DFs, but I can't figure out how to maintain categories of frequency = 0 in each one (as exemplified in the Desired West Regional DF above). Thanks for any insight!

CodePudding user response:

You could convert Category to a factor and make the counts using dplyr's count which has a .drop option allowing you to keep empty categories:

I.e.

library(dplyr)

df |>
  mutate(Category = as.factor(Category)) |>
  count(Region, Category, .drop = FALSE) |>
  filter(Region == "West")

Output:

# A tibble: 6 × 3
  Region Category     n
  <chr>  <fct>    <int>
1 West   blue         1
2 West   green        0
3 West   orange       2
4 West   purple       3
5 West   red          0
6 West   yellow       0

Data:

library(readr)

df <- read_table("Region    Category
West    orange
West    orange
West    blue
West    purple
West    purple
West    purple
North   red
North   yellow
North   green")

CodePudding user response:

Using base R

subset(as.data.frame(table(df)), Region == "West")
   Region Category Freq
2    West     blue    1
4    West    green    0
6    West   orange    2
8    West   purple    3
10   West      red    0
12   West   yellow    0
  • Related