Let's say I have this dataframe:
> df <- data.frame(A=1:5, B=c(0, 0, 3, 0, 0), C=c(1, 0, 0, 1, 0), D=c(0, 2, 0, 0, 1))
> df
A B C D
1 1 0 1 0
2 2 0 0 2
3 3 3 0 0
4 4 0 1 0
5 5 0 0 1
How would I go about converting it to:
A B C D
1 1 0 1 0
2 2 0 0 1
3 2 0 0 1
4 3 1 0 0
5 3 1 0 0
6 3 1 0 0
7 4 0 1 0
8 5 0 0 1
As you can see there are value 2
and 3
, I want to repeat them by that length and change the values back to 1. How would I do that?
I also want to duplicate the the A
column as you can see.
I tried:
replace(df[rep(rownames(df), select(df, -A)),], 2, 1)
But it gives me an error.
CodePudding user response:
One option would be to get max value from columns B
, C
and D
using pmax
, use uncount
to repeat the rows. Use pmin
to replace the values greater than 1 to 1.
library(dplyr)
library(tidyr)
df %>%
mutate(repeat_row = pmax(B, C, D)) %>%
uncount(repeat_row) %>%
mutate(across(-A, pmin, 1))
# A B C D
#1 1 0 1 0
#2 2 0 0 1
#3 2 0 0 1
#4 3 1 0 0
#5 3 1 0 0
#6 3 1 0 0
#7 4 0 1 0
#8 5 0 0 1
CodePudding user response:
Apparently, there's just one value > 0 in columns B to D, so we can exploit the partial rowSums
for a replicate
call on columns B to D binarized using > 0
. So that we can use this in Map
, we t
ranspose twice. Rest is cosmetics.
t(do.call(cbind, Map(replicate,
rowSums(df[-1]),
as.data.frame(t(cbind(df[1], df[-1] > 0)))))) |>
as.data.frame() |>
setNames(names(df))
# A B C D
# 1 1 0 1 0
# 2 2 0 0 1
# 3 2 0 0 1
# 4 3 1 0 0
# 5 3 1 0 0
# 6 3 1 0 0
# 7 4 0 1 0
# 8 5 0 0 1
Note: R>=4.1 used.
CodePudding user response:
Just to modify Ronak Shah's answer a bit, I realized you could simply just do it with only dplyr
:
library(dplyr)
df[rep(rownames(df), apply(select(df, -A), 1, max)),] %>%
as.data.frame(row.names=1:nrow(.)) %>%
mutate(across(-A, pmin, 1))
Output:
A B C D
1 1 0 1 0
2 2 0 0 1
3 2 0 0 1
4 3 1 0 0
5 3 1 0 0
6 3 1 0 0
7 4 0 1 0
8 5 0 0 1
Or with rowSums
:
library(dplyr)
df[rep(rownames(df), rowSums(select(df, -A)),] %>%
as.data.frame(row.names=1:nrow(.)) %>%
mutate(across(-A, pmin, 1))