Creating a table using two columns with multiple values in R-CodePudding

I have a sample dataset in R as follows:

ID <- c(1:10)
State <- c("AL", "AL", "AL", "TX", "TX", "TX", "TX", "TX", "WY", "WY")
QCT <- c(1, 2, 1, 2, 2, 2, 1, 1, 2, NA)

df <- data.frame(ID, State, QCT)

print(df)

   ID State QCT
1   1    AL   1
2   2    AL   2
3   3    AL   1
4   4    TX   2
5   5    TX   2
6   6    TX   2
7   7    TX   1
8   8    TX   1
9   9    WY   2
10 10    WY  NA

I want an output like the below where the rows denote the State column and the other columns are generated based on the aggregated unique values in the QCT column.

State 1 2 NA
AL    2 1 0
TX    2 3 0
WY    0 1 1

I have so far tried group-by() and summarise, also tried using the pivot_wider function from tidyr and tried the options on this thread Count number of rows per group and add result to original data frame, however, I am unable to get the desired result. This is a hypothetical dataframe, and I have a large data set of 490000 observations, where there are 38 unique values for the State column and 3 unique values (including NA which unfortunately I would need to keep as is) for the QCT column.

Thanks for your time and help!

CodePudding user response：

In dplyr you can do.

I added another column called NA, then I sum all the columns in the summarise function.

library(dplyr)

df |> 
  group_by(State) |> 
  mutate("NA" = is.na(QCT)) |> 
  summarise(`1` = sum(QCT == 1, na.rm = TRUE), 
            `2` = sum(QCT == 2, na.rm = TRUE),
            `NA` = sum(is.na(QCT), na.rm = TRUE), .groups = "drop")


  State   `1`   `2`  `NA`
  <chr> <int> <int> <int>
1 AL        2     1     0
2 TX        2     3     0
3 WY        0     1     1

CodePudding user response：

You can use tabyl from the janitor package:

library(dplyr)
library(janitor)

df[,-1] %>% 
  tabyl(State, QCT) %>% 
  rename("NA" = "NA_") %>% 
  as.data.frame

Or you can use table from base R:

as.data.frame.matrix(table(df[,-1], useNA="ifany"))

Output

  State 1 2 NA
1    AL 2 1  0
2    TX 2 3  0
3    WY 0 1  1

CodePudding user response：

Building off your efforts to use pivot_wider and summarize:

library(tidyverse)
df %>% 
  group_by(State, QCT) %>% # will count by State and QCT
  count() %>% # counts by grouping, stored as "n"
  pivot_wider(names_from = QCT, values_from = n) %>% # gives the count for each state and QCT
  mutate(across(everything(), ~replace_na(.x, 0))) %>% # replace NA with 0
  # You said you wanted State as rownames, but your example shows as column (with name); can omit
  column_to_rownames(., var = "State")

Gives

   1 2 NA
AL 2 1  0
TX 2 3  0
WY 0 1  1