I have a list of about thousand data frames. All data frames have column Z and the column consists mostly on NA values, but whenever there is an actual value, it has either "VALUE1" or "VALUE2" in it. For example:
weight | height | Z
---------------------------
62 100 NA
65 89 NA
59 88 randomnumbersVALUE1randomtext
66 92 NA
64 90 NA
64 87 randomnumbersVALUE2randomtext
57 84 NA
The first actual value of each data frame in the column Z should always contain a value of "VALUE1" in it, so in the example data frame above everything is as it should be. However, if the data frame would look like this:
weight | height | Z
---------------------------
62 100 NA
65 89 NA
59 88 randomnumbersVALUE2randomtext
66 92 NA
64 90 NA
64 87 randomnumbersVALUE1randomtext
57 84 NA
I would need to add a new row into the beginning of the data frame with "VALUE1" in the Z column and value 0 in the height and weight columns. How could I do this for my list of data frames (with the help of functions such as add_row and filter)..?
CodePudding user response:
If dfs
is your list of dataframes, you can do this:
dfs = lapply(dfs, function(x) {
if(grepl("VALUE2", x[!is.na(x$Z),"Z"][1])) {
rbind(data.frame(weight=0,height=0,Z="VALUE1"),x)
} else x
})
CodePudding user response:
library(dplyr)
dat %>%
filter(!is.na(Z)) %>%
slice(1) %>%
mutate(across(weight:height, ~ 0)) %>%
filter(!grepl("VALUE1", Z)) %>%
mutate(Z = "VALUE1") %>%
bind_rows(., dat)
# weight height Z
# 1 0 0 VALUE1
# 2 62 100 <NA>
# 3 65 89 <NA>
# 4 59 88 randomnumbersVALUE2randomtext
# 5 66 92 <NA>
# 6 64 90 <NA>
# 7 64 87 randomnumbersVALUE1randomtext
# 8 57 84 <NA>
Data
dat <- structure(list(weight = c(62L, 65L, 59L, 66L, 64L, 64L, 57L), height = c(100L, 89L, 88L, 92L, 90L, 87L, 84L), Z = c(NA, NA, "randomnumbersVALUE2randomtext", NA, NA, "randomnumbersVALUE1randomtext", NA)), class = "data.frame", row.names = c(NA, -7L))