Home > Back-end >  R data transformation: create new rows based on numbers in different columns
R data transformation: create new rows based on numbers in different columns

Time:11-03

I am searching for a solution how to transform the following data frame using dplyr:

item <- c('A','B','C')
one <- c(2, 1, 2)
two <- c(1,1,2)
data <- data.frame(item,one,two)
data
item one two
A 2 1
B 1 1
C 2 2

Now, the column "one" contains the number of ratings of the value 1, the column "two" the number of ratings of the value 2. My ideal data frame after transformation would look like this:

item rating
A 1
A 1
A 2
B 1
B 2
C 1
C 1
C 2
C 2

Any idea how I could get to this output (it doesn't have to be dplyr)? I know how to use pivot_longer of the tidyr package but that doesn't solve the problem of repeating the number of rows...

CodePudding user response:

library(dplyr)
library(tidyr) # pivot_longer
nums <- c(one = 1, two = 2, three = 3)
data %>%
  pivot_longer(-item) %>%
  group_by(item) %>%
  summarize(rating = rep(name, times = value)) %>%
  ungroup() %>%
  mutate(rating = nums[rating])
# # A tibble: 9 x 2
#   item  rating
#   <chr>  <dbl>
# 1 A          1
# 2 A          1
# 3 A          2
# 4 B          1
# 5 B          2
# 6 C          1
# 7 C          1
# 8 C          2
# 9 C          2

I had to define nums because I couldn't find (in my haste) an easy way to convert "one" to 1 in a programmatic way. You'll need to make sure it goes out at least as far as you need; I added three=3 for demonstration, if you truly only have one and two then you should be good as-is.

(Related to that topic: Convert written number to number in R)

CodePudding user response:

Maybe you could convert it from wide to long format with the gather() function and then replace the string values of "one" and "two" by integers

library(tidyverse)
item <- c('A','B','C')
one <- c(2, 1, 2)
two <- c(1,1,2)
data <- data.frame(item,one,two)
long_df <- gather(data, rating, count, one:two)
new_df <- tibble()

for (i in range(nrow(data))) {
  new_df <- rbind(new_df, do.call("rbind", replicate(long_df[i, "count"], long_df, simplify = FALSE)))
}

new_df <- new_df %>% select(-c("count"))
  • Related