Home > OS >  How to repeat by number by multiple columns and divide back
How to repeat by number by multiple columns and divide back

Time:10-13

Let's say I have this dataframe:

> df <- data.frame(A=1:5, B=c(0, 0, 3, 0, 0), C=c(1, 0, 0, 1, 0), D=c(0, 2, 0, 0, 1))
> df
  A B C D
1 1 0 1 0
2 2 0 0 2
3 3 3 0 0
4 4 0 1 0
5 5 0 0 1

How would I go about converting it to:

  A B C D
1 1 0 1 0
2 2 0 0 1
3 2 0 0 1
4 3 1 0 0
5 3 1 0 0
6 3 1 0 0
7 4 0 1 0
8 5 0 0 1

As you can see there are value 2 and 3, I want to repeat them by that length and change the values back to 1. How would I do that?

I also want to duplicate the the A column as you can see.

I tried:

replace(df[rep(rownames(df), select(df, -A)),], 2, 1)

But it gives me an error. I am very new to R.

CodePudding user response:

One option would be to get max value from columns B, C and D using pmax, use uncount to repeat the rows. Use pmin to replace the values greater than 1 to 1.

library(dplyr)
library(tidyr)

df %>%
  mutate(repeat_row = pmax(B, C, D)) %>%
  uncount(repeat_row) %>%
  mutate(across(-A, pmin, 1))

#  A B C D
#1 1 0 1 0
#2 2 0 0 1
#3 2 0 0 1
#4 3 1 0 0
#5 3 1 0 0
#6 3 1 0 0
#7 4 0 1 0
#8 5 0 0 1

CodePudding user response:

Just to modify Ronak Shah's answer a bit, I realized you could simply just do it with only dplyr:

library(dplyr)

df[rep(rownames(df), apply(select(df, -A), 1, max)),] %>%
  as.data.frame(row.names=1:nrow(.)) %>%
  mutate(across(-A, pmin, 1))

Output:

  A B C D
1 1 0 1 0
2 2 0 0 1
3 2 0 0 1
4 3 1 0 0
5 3 1 0 0
6 3 1 0 0
7 4 0 1 0
8 5 0 0 1
  • Related