How can I divide one variable into two variables in R?-CodePudding

I have a variable x which can take five values (0,1,2,3,4). I want to divide the variable into two variables. Variable 1 is supposed to contain the value 0 and variable two is supposed to contain the values 1,2,3 and 4. I'm sure this is easy but I can't find out what i need to do.

what my data looks like:

|variable x|
|-----------|
|0|
|1|
|0|
|4|
|3|
|0|
|0|
|2|

so i get the table:

0	1	2	3	4
125	34	14	15	15

But I want my data to look like this

variable 1
125

variable 2
78

So variable 1 is supposed to contain how often 0 is in my data

and variable 2 is supposed to contain the sum of how often 1,2,3 and 4 are in my data

CodePudding user response：

You can convert the variable to logical by testing whether x == 0

x <- c(0, 1, 0, 4, 3, 0, 0, 2)

table(x)
#> x
#> 0 1 2 3 4 
#> 4 1 1 1 1 

table(x == 0)
#> FALSE  TRUE 
#>     4     4

If you want the exact headings, you can do:

setNames(table(x == 0), c(0, paste(unique(sort(x[x != 0])), collapse = ","))
#>     0   1,2,3,4 
#>     4         4

And if you want to change the variable to a factor you could do:

c("zero", "not zero")[1   (x != 0)]
#>          x
#> 1     zero
#> 2 not zero
#> 3     zero
#> 4 not zero
#> 5 not zero
#> 6     zero
#> 7     zero
#> 8 not zero

^{Created on 2022-04-02 by the reprex package (v2.0.1)}

CodePudding user response：

base R

You can use cbind:

x = sample(0:5, 200, replace = T)
table(x)
# x
#  0  1  2  3  4  5 
# 29 38 41 35 27 30

cbind(`0` = table(x)[1], `1,2,3,4` = sum(table(x)[2:5]))
#    0 1,2,3,4
# 0 29     141

tidyverse

library(tidyverse)
ta = as.data.frame(t(as.data.frame.array(table(x))))
ta %>% 
  mutate(!!paste(names(.[-1]), collapse = ",") := sum(c_across(`1`:`5`)), .keep = "unused")

#    0 1,2,3,4,5
# 1 29       171

CodePudding user response：

Beginning with the vector, we can get the frequency from table then put it into a dataframe. Then, we can create a new column with the names collapsed (i.e., 1,2,3,4) and get the row sum for all columns except the first one.

library(tidyverse)

tab <- data.frame(value=c(0, 1, 2, 3, 4), 
              freq=c(125,   34, 14, 15, 15))
x <- rep(tab$value, tab$freq)

output <- data.frame(rbind(table(x))) %>%
  rename_with(~str_remove(., 'X')) %>%
  mutate(!!paste0(names(.)[-1], collapse = ",") := rowSums(select(., -1))) %>%
  select(1, last_col())

Output

    0 1,2,3,4
1 125      78

Then, to create the 2 variables in 2 dataframes, you can split the columns into a list, change the names, then put into the global environment.

list2env(setNames(
  split.default(output, seq_along(output)),
  c("variable 1", "variable 2")
), envir = .GlobalEnv)

Or you could just subset:

variable1 <- data.frame(`variable 1` = output$`0`, check.names = FALSE)
variable2 <- data.frame(`variable 2` = output$`1,2,3,4`, check.names = FALSE)

CodePudding user response：

Update: deleted first answer:

df[paste(names(df[2:5]), collapse = ",")] <- rowSums(df[2:5])
df[, c(1,6)]

# A tibble: 1 × 2
    `0` `1,2,3,4`
  <dbl>     <dbl>
1   125        78

data:

df <- structure(list(`0` = 125, `1` = 34, `2` = 14, `3` = 15, `4` = 15), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -1L))