I have this example dataframe "df":
id <- c(1001, 1002)
col2 <- c(5, 2)
col3 <- c(1, 4)
df <- data.frame(id, col2, col3)
Is there an easy way to convert this data frame so that the new data frame contains same column names but assigns "1" to the nth row that corresponds to each value and assigns "0" to the remaining slots? It seemed doable but somewhat difficult. The resulting table will be as follows (df_results):
id <- c(rep(1001, 5), rep(1002, 5))
col2 <- c(0,0,0,0,1, 0,1,0,0,0)
col3 <- c(1,0,0,0,0,0,0,0,1,0)
df_results <- data.frame(id, col2, col3)
CodePudding user response:
You can uncount()
using the parallel max of your cols, then grouping by id, check if the value equals the row number:
library(dplyr)
library(tidyr)
df %>%
uncount(pmax(col2, col3)) %>%
group_by(id) %>%
mutate(across(starts_with("col"), ~ as.numeric(.x == row_number()))) %>%
ungroup()
# A tibble: 9 × 3
id col2 col3
<dbl> <dbl> <dbl>
1 1001 0 1
2 1001 0 0
3 1001 0 0
4 1001 0 0
5 1001 1 0
6 1002 0 0
7 1002 1 0
8 1002 0 0
9 1002 0 1
CodePudding user response:
Yes, you can use the melt() and dcast() functions from the tidyr package in R to convert the data frame in the desired format.
First, you can use the melt() function to convert the data frame into a "long" format, where each column becomes a new row in the data frame. You can then use the dcast() function to convert the data frame back into a "wide" format, but this time with each value in the "long" data frame becoming a new column.
Here's an example of how you can do this:
library(tidyr)
# Example data frame
id <- c(1001, 1002)
col2 <- c(5, 2)
col3 <- c(1, 4)
df <- data.frame(id, col2, col3)
# Convert the data frame into a "long" format using melt()
df_long <- melt(df, id.vars = "id")
# Convert the data frame into a "wide" format using dcast()
df_wide <- dcast(df_long, id variable ~ value, fun.aggregate = function(x) 1, value.var = "value")
# Replace the NA values with 0
df_wide[is.na(df_wide)] <- 0
# Print the resulting data frame
df_wide
This will give you the following data frame:
id variable 1 2 3 4 5
1 1001 col2 0 0 0 0 1
2 1002 col2 0 1 0 0 0
3 1001 col3 1 0 0 0 0
4 1002 col3 0 0 0 1 0