I have a variable x which can take five values (0,1,2,3,4). I want to divide the variable into two variables. Variable 1 is supposed to contain the value 0 and variable two is supposed to contain the values 1,2,3 and 4. I'm sure this is easy but I can't find out what i need to do.
what my data looks like:
|variable x|
|-----------|
|0|
|1|
|0|
|4|
|3|
|0|
|0|
|2|
so i get the table:
0 | 1 | 2 | 3 | 4 |
---|---|---|---|---|
125 | 34 | 14 | 15 | 15 |
But I want my data to look like this
variable 1 |
---|
125 |
variable 2 |
---|
78 |
So variable 1 is supposed to contain how often 0 is in my data
and variable 2 is supposed to contain the sum of how often 1,2,3 and 4 are in my data
CodePudding user response:
You can convert the variable to logical by testing whether x == 0
x <- c(0, 1, 0, 4, 3, 0, 0, 2)
table(x)
#> x
#> 0 1 2 3 4
#> 4 1 1 1 1
table(x == 0)
#> FALSE TRUE
#> 4 4
If you want the exact headings, you can do:
setNames(table(x == 0), c(0, paste(unique(sort(x[x != 0])), collapse = ","))
#> 0 1,2,3,4
#> 4 4
And if you want to change the variable to a factor you could do:
c("zero", "not zero")[1 (x != 0)]
#> x
#> 1 zero
#> 2 not zero
#> 3 zero
#> 4 not zero
#> 5 not zero
#> 6 zero
#> 7 zero
#> 8 not zero
Created on 2022-04-02 by the reprex package (v2.0.1)
CodePudding user response:
base R
You can use cbind
:
x = sample(0:5, 200, replace = T)
table(x)
# x
# 0 1 2 3 4 5
# 29 38 41 35 27 30
cbind(`0` = table(x)[1], `1,2,3,4` = sum(table(x)[2:5]))
# 0 1,2,3,4
# 0 29 141
tidyverse
library(tidyverse)
ta = as.data.frame(t(as.data.frame.array(table(x))))
ta %>%
mutate(!!paste(names(.[-1]), collapse = ",") := sum(c_across(`1`:`5`)), .keep = "unused")
# 0 1,2,3,4,5
# 1 29 171
CodePudding user response:
Beginning with the vector, we can get the frequency from table
then put it into a dataframe. Then, we can create a new column with the names collapsed (i.e., 1,2,3,4
) and get the row sum for all columns except the first one.
library(tidyverse)
tab <- data.frame(value=c(0, 1, 2, 3, 4),
freq=c(125, 34, 14, 15, 15))
x <- rep(tab$value, tab$freq)
output <- data.frame(rbind(table(x))) %>%
rename_with(~str_remove(., 'X')) %>%
mutate(!!paste0(names(.)[-1], collapse = ",") := rowSums(select(., -1))) %>%
select(1, last_col())
Output
0 1,2,3,4
1 125 78
Then, to create the 2 variables in 2 dataframes, you can split the columns into a list, change the names, then put into the global environment.
list2env(setNames(
split.default(output, seq_along(output)),
c("variable 1", "variable 2")
), envir = .GlobalEnv)
Or you could just subset:
variable1 <- data.frame(`variable 1` = output$`0`, check.names = FALSE)
variable2 <- data.frame(`variable 2` = output$`1,2,3,4`, check.names = FALSE)
CodePudding user response:
Update: deleted first answer:
df[paste(names(df[2:5]), collapse = ",")] <- rowSums(df[2:5])
df[, c(1,6)]
# A tibble: 1 × 2
`0` `1,2,3,4`
<dbl> <dbl>
1 125 78
data:
df <- structure(list(`0` = 125, `1` = 34, `2` = 14, `3` = 15, `4` = 15), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -1L))