I am trying to find a way to do the following:
Find the string "John" in column "Zone", then if the previous cell above it with a value (some rows are empty) in "Zone" column was the string "Four", replace that string with the word "Five".
I could do this with a mutate/lead function but the space between the data is not uniform, and I can't figure out how to count back the required rows.
My next thought was I could extract only the rows from the "Zone" column into a new dataframe so that the data was more uniform, and then mutate as required and then merge the data back into the original dataframe. This became too messy and I couldn't make it work either.
Can anyone suggest a way to achieve this? Any help would be greatly appreciated.
Sample input:
structure(list(City = c("Mumbai", "Delhi", "Bangalore", "Red",
"Hyderabad", "Ahmedabad", "Chennai", "Kolkata", "Surat", "Pune",
"Yellow", "Jaipur", "Lucknow", "Kanpur", "Nagpur", "Indore",
"Blue", "Bhopal", "Pimpri-Chinchwad", "Patna", "Vadodara", "Green"
), Pop = c("12442373", "11007835", "8425970", "31876178", "6809970",
"5570585", "4681087", "4486679", "4467797", "3115431", "12069907",
"3046163", "2815601", "2767031", "2405665", "1960631", "7133327",
"1795648", "1727692", "1684222", "1670806", "5082720"), Zone = c("One",
"", "", "Tony", "Two", "", "", "", "", "", "John", "Three", "",
"", "", "", "Mary", "Four", "", "", "", "John"), Test = c("Yes",
"", "", "", "No", "", "", "", "", "", "", "Yes", "", "", "",
"", "", "No", "", "", "", "")), row.names = 1:22, class = "data.frame")
Desired output
structure(list(City = c("Mumbai", "Delhi", "Bangalore", "Red",
"Hyderabad", "Ahmedabad", "Chennai", "Kolkata", "Surat", "Pune",
"Yellow", "Jaipur", "Lucknow", "Kanpur", "Nagpur", "Indore",
"Blue", "Bhopal", "Pimpri-Chinchwad", "Patna", "Vadodara", "Green"
), Pop = c("12442373", "11007835", "8425970", "31876178", "6809970",
"5570585", "4681087", "4486679", "4467797", "3115431", "12069907",
"3046163", "2815601", "2767031", "2405665", "1960631", "7133327",
"1795648", "1727692", "1684222", "1670806", "5082720"), Zone = c("One",
"", "", "Tony", "Two", "", "", "", "", "", "John", "Three", "",
"", "", "", "Mary", "Five", "", "", "", "John"), Test = c("Yes",
"", "", "", "No", "", "", "", "", "", "", "Yes", "", "", "",
"", "", "No", "", "", "", "")), row.names = 1:22, class = "data.frame")
CodePudding user response:
I've made it work like this. It's not elegant, but I suppose if it works it works.
test = filter(input, Zone != "")
answer <- test %>%
mutate(Zone1 = if_else(lead(Zone, default = first(Zone)) == "John", "Five", Zone))
answer = answer %>% dplyr::mutate(Zone = ifelse(Zone != "Four", Zone, Zone1))
output = left_join(input, answer ,by=c('City'='City', 'Pop'='Pop', 'Test' = 'Test'))
output = output[c("City","Pop","Zone.y","Test")]
output[is.na(output)] <- ""
colnames(output) = colnames(input)
CodePudding user response:
Here is one option. Create a group that defines a switch from any value in Zone
that does not include blank cells. Then check if the index of the group 1 is the same as the index for John and if the value of Zone in that index is "four. If so, change to "five".
library(tidyverse)
df |>
mutate(grp = cumsum(Zone!=lag(Zone, default = first(Zone)) & Zone != "") |>
replace(which(Zone == ""), NA)) |>
mutate(flag = (grp 1) %in% grp[which(Zone == "John")] & Zone == "Four",
Zone = ifelse(flag, "Five", Zone)) |>
select(-c(grp, flag))
#> City Pop Zone Test
#> 1 Mumbai 12442373 One Yes
#> 2 Delhi 11007835
#> 3 Bangalore 8425970
#> 4 Red 31876178 Tony
#> 5 Hyderabad 6809970 Two No
#> 6 Ahmedabad 5570585
#> 7 Chennai 4681087
#> 8 Kolkata 4486679
#> 9 Surat 4467797
#> 10 Pune 3115431
#> 11 Yellow 12069907 John
#> 12 Jaipur 3046163 Three Yes
#> 13 Lucknow 2815601
#> 14 Kanpur 2767031
#> 15 Nagpur 2405665
#> 16 Indore 1960631
#> 17 Blue 7133327 Mary
#> 18 Bhopal 1795648 Five No
#> 19 Pimpri-Chinchwad 1727692
#> 20 Patna 1684222
#> 21 Vadodara 1670806
#> 22 Green 5082720 John