Home > Software design >  Mutate a random lead variable
Mutate a random lead variable

Time:11-12

I am trying to find a way to do the following:

Find the string "John" in column "Zone", then if the previous cell above it with a value (some rows are empty) in "Zone" column was the string "Four", replace that string with the word "Five".

I could do this with a mutate/lead function but the space between the data is not uniform, and I can't figure out how to count back the required rows.

My next thought was I could extract only the rows from the "Zone" column into a new dataframe so that the data was more uniform, and then mutate as required and then merge the data back into the original dataframe. This became too messy and I couldn't make it work either.

Can anyone suggest a way to achieve this? Any help would be greatly appreciated.

Sample input:

structure(list(City = c("Mumbai", "Delhi", "Bangalore", "Red", 
"Hyderabad", "Ahmedabad", "Chennai", "Kolkata", "Surat", "Pune", 
"Yellow", "Jaipur", "Lucknow", "Kanpur", "Nagpur", "Indore", 
"Blue", "Bhopal", "Pimpri-Chinchwad", "Patna", "Vadodara", "Green"
), Pop = c("12442373", "11007835", "8425970", "31876178", "6809970", 
"5570585", "4681087", "4486679", "4467797", "3115431", "12069907", 
"3046163", "2815601", "2767031", "2405665", "1960631", "7133327", 
"1795648", "1727692", "1684222", "1670806", "5082720"), Zone = c("One", 
"", "", "Tony", "Two", "", "", "", "", "", "John", "Three", "", 
"", "", "", "Mary", "Four", "", "", "", "John"), Test = c("Yes", 
"", "", "", "No", "", "", "", "", "", "", "Yes", "", "", "", 
"", "", "No", "", "", "", "")), row.names = 1:22, class = "data.frame")

Desired output

structure(list(City = c("Mumbai", "Delhi", "Bangalore", "Red", 
"Hyderabad", "Ahmedabad", "Chennai", "Kolkata", "Surat", "Pune", 
"Yellow", "Jaipur", "Lucknow", "Kanpur", "Nagpur", "Indore", 
"Blue", "Bhopal", "Pimpri-Chinchwad", "Patna", "Vadodara", "Green"
), Pop = c("12442373", "11007835", "8425970", "31876178", "6809970", 
"5570585", "4681087", "4486679", "4467797", "3115431", "12069907", 
"3046163", "2815601", "2767031", "2405665", "1960631", "7133327", 
"1795648", "1727692", "1684222", "1670806", "5082720"), Zone = c("One", 
"", "", "Tony", "Two", "", "", "", "", "", "John", "Three", "", 
"", "", "", "Mary", "Five", "", "", "", "John"), Test = c("Yes", 
"", "", "", "No", "", "", "", "", "", "", "Yes", "", "", "", 
"", "", "No", "", "", "", "")), row.names = 1:22, class = "data.frame")

CodePudding user response:

I've made it work like this. It's not elegant, but I suppose if it works it works.

test = filter(input, Zone != "")
answer <- test %>%
  mutate(Zone1 = if_else(lead(Zone, default = first(Zone)) == "John", "Five", Zone))
answer = answer %>% dplyr::mutate(Zone = ifelse(Zone != "Four", Zone, Zone1))
output = left_join(input, answer ,by=c('City'='City', 'Pop'='Pop', 'Test' = 'Test'))
output = output[c("City","Pop","Zone.y","Test")]
output[is.na(output)] <- ""
colnames(output) = colnames(input)

CodePudding user response:

Here is one option. Create a group that defines a switch from any value in Zone that does not include blank cells. Then check if the index of the group 1 is the same as the index for John and if the value of Zone in that index is "four. If so, change to "five".

library(tidyverse)

df |>
  mutate(grp = cumsum(Zone!=lag(Zone, default = first(Zone)) & Zone != "") |>
           replace(which(Zone == ""), NA)) |>
  mutate(flag = (grp  1) %in% grp[which(Zone == "John")] & Zone == "Four",
         Zone = ifelse(flag, "Five", Zone)) |>
  select(-c(grp, flag))
#>                City      Pop  Zone Test
#> 1            Mumbai 12442373   One  Yes
#> 2             Delhi 11007835           
#> 3         Bangalore  8425970           
#> 4               Red 31876178  Tony     
#> 5         Hyderabad  6809970   Two   No
#> 6         Ahmedabad  5570585           
#> 7           Chennai  4681087           
#> 8           Kolkata  4486679           
#> 9             Surat  4467797           
#> 10             Pune  3115431           
#> 11           Yellow 12069907  John     
#> 12           Jaipur  3046163 Three  Yes
#> 13          Lucknow  2815601           
#> 14           Kanpur  2767031           
#> 15           Nagpur  2405665           
#> 16           Indore  1960631           
#> 17             Blue  7133327  Mary     
#> 18           Bhopal  1795648  Five   No
#> 19 Pimpri-Chinchwad  1727692           
#> 20            Patna  1684222           
#> 21         Vadodara  1670806           
#> 22            Green  5082720  John
  • Related