I have a sample dataset in R as follows:
ID <- c(1:10)
State <- c("AL", "AL", "AL", "TX", "TX", "TX", "TX", "TX", "WY", "WY")
QCT <- c(1, 2, 1, 2, 2, 2, 1, 1, 2, NA)
df <- data.frame(ID, State, QCT)
print(df)
ID State QCT
1 1 AL 1
2 2 AL 2
3 3 AL 1
4 4 TX 2
5 5 TX 2
6 6 TX 2
7 7 TX 1
8 8 TX 1
9 9 WY 2
10 10 WY NA
I want an output like the below where the rows denote the State column and the other columns are generated based on the aggregated unique values in the QCT column.
State 1 2 NA
AL 2 1 0
TX 2 3 0
WY 0 1 1
I have so far tried group-by()
and summarise
, also tried using the pivot_wider
function from tidyr
and tried the options on this thread Count number of rows per group and add result to original data frame, however, I am unable to get the desired result.
This is a hypothetical dataframe, and I have a large data set of 490000 observations, where there are 38 unique values for the State
column and 3 unique values (including NA which unfortunately I would need to keep as is) for the QCT
column.
Thanks for your time and help!
CodePudding user response:
In dplyr you can do.
I added another column called NA, then I sum all the columns in the summarise function.
library(dplyr)
df |>
group_by(State) |>
mutate("NA" = is.na(QCT)) |>
summarise(`1` = sum(QCT == 1, na.rm = TRUE),
`2` = sum(QCT == 2, na.rm = TRUE),
`NA` = sum(is.na(QCT), na.rm = TRUE), .groups = "drop")
State `1` `2` `NA`
<chr> <int> <int> <int>
1 AL 2 1 0
2 TX 2 3 0
3 WY 0 1 1
CodePudding user response:
You can use tabyl
from the janitor
package:
library(dplyr)
library(janitor)
df[,-1] %>%
tabyl(State, QCT) %>%
rename("NA" = "NA_") %>%
as.data.frame
Or you can use table
from base R:
as.data.frame.matrix(table(df[,-1], useNA="ifany"))
Output
State 1 2 NA
1 AL 2 1 0
2 TX 2 3 0
3 WY 0 1 1
CodePudding user response:
Building off your efforts to use pivot_wider and summarize:
library(tidyverse)
df %>%
group_by(State, QCT) %>% # will count by State and QCT
count() %>% # counts by grouping, stored as "n"
pivot_wider(names_from = QCT, values_from = n) %>% # gives the count for each state and QCT
mutate(across(everything(), ~replace_na(.x, 0))) %>% # replace NA with 0
# You said you wanted State as rownames, but your example shows as column (with name); can omit
column_to_rownames(., var = "State")
Gives
1 2 NA
AL 2 1 0
TX 2 3 0
WY 0 1 1