I have a df
df <- data.frame(ID = c(1, 2, 3, 4, 5, 5, 7, 8),
var1 = c('a', 'b', 'c', 'd', 'e', 'f', 'g', 'h'),
var2 = c(1, 1, 0, 0, 1, 1, 0, 0),
var3 = c(50, 40, 30, 45, 33, 51, 70, 46))
I would like to modify var2 to '0.3' for 25% of the dataframe using:
df %>%
mutate(var2 = case_when(sample(n()) <= n()*0.25 ~ 0.3,
TRUE ~ var2
))
However, I would like the 25% of data to be selected by descending order of var3 so that the output is:
ID var1 var2 var3
1 1 a 1 50
2 2 b 1 40
3 3 c 0 30
4 4 d 0 45
5 5 e 1 33
6 5 f 0.3 51
7 7 g 0.3 70
8 8 h 0 46
Where row IDS 6 & 7 have been modified as these have the highest and second highest value for Var3. It should work so that I can vary the % of mutations but that they are always applied in descending order of Var3.
Thank you in advance
CodePudding user response:
Here's one way:
set.seed(42)
df %>%
mutate(var2 = if_else(row_number() %in% sample(n(), size = ceiling(n()/4)), 0.3, var2))
# ID var1 var2 var3
# 1 1 a 0.3 50
# 2 2 b 1.0 40
# 3 3 c 0.0 30
# 4 4 d 0.0 45
# 5 5 e 0.3 33
# 6 5 f 1.0 51
# 7 7 g 0.0 70
# 8 8 h 0.0 46
CodePudding user response:
Solution using arrange
, then returning to the previous ordering.
df %>%
mutate(Row = row_number()) %>%
arrange(desc(var3)) %>%
mutate(Magnitude_index = row_number(),
var2 = if_else(Magnitude_index <= n() * 0.25, 0.3, var2)
) %>%
arrange(Row) %>%
select(any_of(names(df)))
ID var1 var2 var3
1 1 a 1.0 50
2 2 b 1.0 40
3 3 c 0.0 30
4 4 d 0.0 45
5 5 e 1.0 33
6 5 f 0.3 51
7 7 g 0.3 70
8 8 h 0.0 46
CodePudding user response:
Here a solution sorting your dataframe by var3
and checking row_number
is equal or less than 25% of your total row numbers:
df %>% arrange(desc(var3)) %>% mutate(var2 = ifelse(row_number() <= 0.25*nrow(df), 0.3, var2)) %>% arrange(ID)
Output:
ID var1 var2 var3
1 1 a 1.0 50
2 2 b 1.0 40
3 3 c 0.0 30
4 4 d 0.0 45
5 5 f 0.3 51
6 5 e 1.0 33
7 7 g 0.3 70
8 8 h 0.0 46