Home > Back-end >  R: Splitting of column into two
R: Splitting of column into two

Time:11-08

Hi i have this table made from the package Publish and the function is called univariatetable.

> > table1 = summary(univariateTable(vaccinated~Sex Agegrp Q(SampleAge) variable1 variable2 etc.etc.
>                                  data = hospital),
>                  show.pvalues=F)

The table i get is something like this that shows frequency and percentage in parantheses (I have just put artificial numbers and not all variables):

Variable LEVEL Vaccinated =1 (n=52) Vaccinated = 0 (n=34)
Sex M 30 (57.7) 60 (54.3)
F 22 (42.3) 46 (34.7)

My wish is to split the columns vaccinated = 1 and vaccinated = 0 into 4 seperate columns : one for the frequency and one for the percentages. How can I do so, while keeping all the other variables in the table?

i have tried this but it doesnt keep all variables from the table

str_split_fixed(table1$`vaccinated = 0 (n=34)`, " ", 2)

CodePudding user response:

Something like this:

library(data.table)
library(dplyr)

ToNumeric <- function(x) {
   as.numeric(gsub(" |\\)", "", x))
}

new_table <- bind_cols(
   tstrsplit(table1$x1, "\\("),
   tstrsplit(table1$x2, "\\(")
) %>% data.frame() %>% apply(2, ToNumeric)

Where table1 is your dataset, and "x1" and "x2" is like "Vaccinated =1 (n=52)" and "Vaccinated = 0 (n=34)". new_table has no colnames, you should set it manually.

CodePudding user response:

Maybe this is worth a try:

Data:

df <- data.frame(
  Variable = c("Sex", NA),  
  LEVEL = c("M", "F"),
  `Vaccinated = 1 (n=52)` = c("30 (57.7)", "60 (54.3)"),    
  `Vaccinated = 0 (n=34)` = c("22 (42.3)", "46 (34.7)"))

Now pivot_longerand then extract the relevant data into columns:

library(dplyr)
library(tidyr)
df %>%
  pivot_longer(cols = 3:4, names_to = 'Vacc') %>%
  extract(col = Vacc,
          into = c("Num_vacc", "Total"),
          regex = "\\D (\\d)\\D (\\d )",
          convert = TRUE) %>%
  extract(col = value,
          into = c("Freq", "Pctg"),
          regex = "(\\d )\\D ([\\d.] )",
          convert = TRUE)
# A tibble: 4 x 6
  Variable LEVEL Num_vacc Total  Freq  Pctg
  <chr>    <chr>    <int> <int> <int> <dbl>
1 Sex      M            1    52    30  57.7
2 Sex      M            0    34    22  42.3
3 NA       F            1    52    60  54.3
4 NA       F            0    34    46  34.7
  • Related