Home > Software engineering >  Dataframe manipulation in R - Assign a value to nth row based on row values numbers
Dataframe manipulation in R - Assign a value to nth row based on row values numbers

Time:12-06

I have this example dataframe "df":

id <- c(1001, 1002)
col2 <- c(5, 2)
col3 <- c(1, 4)
df <- data.frame(id, col2, col3)

Is there an easy way to convert this data frame so that the new data frame contains same column names but assigns "1" to the nth row that corresponds to each value and assigns "0" to the remaining slots? It seemed doable but somewhat difficult. The resulting table will be as follows (df_results):

id <- c(rep(1001, 5), rep(1002, 5))
col2 <- c(0,0,0,0,1, 0,1,0,0,0)
col3 <- c(1,0,0,0,0,0,0,0,1,0)
df_results <- data.frame(id, col2, col3)

CodePudding user response:

You can uncount() using the parallel max of your cols, then grouping by id, check if the value equals the row number:

library(dplyr)
library(tidyr)

df %>%
  uncount(pmax(col2, col3)) %>%
  group_by(id) %>%
  mutate(across(starts_with("col"), ~ as.numeric(.x == row_number()))) %>%
  ungroup()

# A tibble: 9 × 3
     id  col2  col3
  <dbl> <dbl> <dbl>
1  1001     0     1
2  1001     0     0
3  1001     0     0
4  1001     0     0
5  1001     1     0
6  1002     0     0
7  1002     1     0
8  1002     0     0
9  1002     0     1

CodePudding user response:

Yes, you can use the melt() and dcast() functions from the tidyr package in R to convert the data frame in the desired format.

First, you can use the melt() function to convert the data frame into a "long" format, where each column becomes a new row in the data frame. You can then use the dcast() function to convert the data frame back into a "wide" format, but this time with each value in the "long" data frame becoming a new column.

Here's an example of how you can do this:

library(tidyr)

# Example data frame
id <- c(1001, 1002)
col2 <- c(5, 2)
col3 <- c(1, 4)
df <- data.frame(id, col2, col3)

# Convert the data frame into a "long" format using melt()
df_long <- melt(df, id.vars = "id")

# Convert the data frame into a "wide" format using dcast()
df_wide <- dcast(df_long, id   variable ~ value, fun.aggregate = function(x) 1, value.var = "value")

# Replace the NA values with 0
df_wide[is.na(df_wide)] <- 0

# Print the resulting data frame
df_wide

This will give you the following data frame:

    id variable 1 2 3 4 5
1 1001    col2 0 0 0 0 1
2 1002    col2 0 1 0 0 0
3 1001    col3 1 0 0 0 0
4 1002    col3 0 0 0 1 0
  • Related