I'm trying to generate some descriptive summary tables in R using the CreateTableOne function. I have a series of variables that all have the same response options/levels (Yes or No), and want to generate a wide table where the levels are column headings, like this:
Variable | Yes | No |
---|---|---|
Var1 | 1 | 7 |
Var2 | 5 | 2 |
But CreateTableOne generates nested long tables, with one column for Level where Yes and No are values in rows, like this:
Variable | Level | Value |
---|---|---|
Var1 | Yes | 1 |
Var1 | No | 7 |
Is there a way to pivot the table to get what I want while still using this function, or is there a different function I should be using instead?
Here is my current code:
vars <- c('var1', 'var2')
Table <- CreateTableOne(vars=vars, data=dataframe, factorVars=vars)
Table_exp <- print(Table, showAllLevels = T, varLabels = T, format="f", test=FALSE, noSpaces = TRUE, printToggle = FALSE)
write.csv(Table_exp, file = "Table.csv")
Thanks!
CodePudding user response:
You could use only the pivot_wider
to make that table. Here is your data:
library(tidyverse)
dataframe = data.frame(Variable = c("Var1", "Var1", "Var2", "Var2"),
Level = c("Yes", "No", "Yes", "No"),
Value = c(1, 7, 5, 2))
Your data:
Variable Level Value
1 Var1 Yes 1
2 Var1 No 7
3 Var2 Yes 5
4 Var2 No 2
You can use this code to make the wider table:
dataframe %>%
pivot_wider(names_from = "Level", values_from = "Value")
Output:
# A tibble: 2 × 3
Variable Yes No
<chr> <dbl> <dbl>
1 Var1 1 7
2 Var2 5 2
CodePudding user response:
So I got the answer to this question from a coworker, and it's very similar to what Quinten suggested but with some additional steps to account for the structure of my raw data.
The example tables I provided in my question were my desired outputs, not examples of my raw data. The number values weren't values in my dataset, but rather calculated counts of records, and the solution below includes steps for doing that calculation.
This is what my raw data looks like, and it's actually structured wide:
Participant_ID | Var1 | Var2 | Age |
---|---|---|---|
1 | Yes | No | 20 |
2 | No | No | 30 |
We started by creating a subset with just the relevant variables:
subset <- data |> select(Participant_Id, Var1, Var2)
Then pivoted the data longer first, in order to calculate the counts I wanted in my output table. In this code, we specify that we don't want to pivot Participant_ID and create columns called Vars and Response.
subsetlong <- subset |> pivot_longer(-c("Participant_Id"), names_to = "Vars", values_to="Response")
This is what subsetlong looks like:
Participant_ID | Vars | Response |
---|---|---|
1 | Var1 | Yes |
1 | Var2 | No |
2 | Var1 | No |
2 | Var2 | No |
Then we calculated the counts by Vars, putting that into a new dataframe called counts:
counts <- subsetlong |> group_by(Vars) |> count(Response)
And this is what counts looks like:
Vars | Response | n |
---|---|---|
Var1 | Yes | 1 |
Var1 | No | 7 |
Var2 | Yes | 5 |
Var2 | No | 2 |
Now that the calculation was done, we pivoted this back to wide again, specifying that any NAs should appear as 0s:
counts_wide <- counts |> pivot_wider(names_from="Response", values_from="n", values_fill = 0)
And finally got the desired structure:
Vars | Yes | No |
---|---|---|
Var1 | 1 | 7 |
Var2 | 5 | 2 |