I'm trying to replace NA values in a column of a dataframe based on strings in R. My data is called flora and looks like this:
species | mating system | pollination | cleistogamy |
---|---|---|---|
Blysmus compressus | generally self | entirely cleistogamous | |
Dactylis glomerata | NA | wind | cleistogamy not recorded |
Daucus carota | generally cross | NA | NA |
Agrostis curtisii | NA | wind | cleistogamy not recorded |
Hornungia petraea | generally self | insect | cleistogamy not recorded |
I want to replace NA values in the mating system column when the pollination column == "wind" and the cleistogamy column =="cleistogamy not recorded", without affecting existing values in the mating system column.
I've tried multiple approaches but keep coming up with errors. Applying the following code
is.na(flora$msystem) [is.na(flora$pollination)=="wind"& is.na(flora$cleistogamy)=="cleistogamy not recorded"]<- "generally cross"
I get an error saying the replacement has more rows than the data. Nothing seems to change in the data when I try either
flora%>%
mutate(msystem = if_else(is.na(msystem) &
is.na(pollination)=="wind" &
is.na(cleistogamy) == "cleistogamy not recorded", "generally cross", msystem))
or
flora %>%
mutate(msystem = case_when(is.na(msystem) & pollination=="wind" & cleistogamy == "cleistogamy not recorded" ~ "generally cross",
TRUE ~ msystem))
I'm quite stumped, any advice would be extremely helpful!
CodePudding user response:
Here is one option
library(dplyr)
flora %>%
mutate(msystem = case_when(is.na(msystem) &
pollination %in% "wind" &
cleistogamy %in% "cleistogamy not recorded" ~ "generally cross",
TRUE ~ msystem))
-output
species msystem pollination cleistogamy
1 Blysmus compressus generally self <NA> entirely cleistogamous
2 Dactylis glomerata generally cross wind cleistogamy not recorded
3 Daucus carota generally cross <NA> <NA>
4 Agrostis curtisii generally cross wind cleistogamy not recorded
5 Hornungia petraea generally self insect cleistogamy not recorded
data
flora <- structure(list(species = c("Blysmus compressus", "Dactylis glomerata",
"Daucus carota", "Agrostis curtisii", "Hornungia petraea"), msystem = c("generally self",
NA, "generally cross", NA, "generally self"), pollination = c(NA,
"wind", NA, "wind", "insect"), cleistogamy = c("entirely cleistogamous",
"cleistogamy not recorded", NA, "cleistogamy not recorded", "cleistogamy not recorded"
)), class = "data.frame", row.names = c(NA, -5L))
CodePudding user response:
In general, I think you do not understand what is.na(.)
is meant to do. It returns whether the objects in its argument are NA
or not. I does not filter or restrict assignment.
A few things:
Your base code fails because the left side of your assignment is empty:
is.na(flora$msystem)[is.na(flora$pollination)=="wind"& is.na(flora$cleistogamy)=="cleistogamy not recorded"] # logical(0)
We can find out why by breaking it down.
is.na(flora$msystem) # [1] FALSE TRUE FALSE TRUE FALSE
The above is fine.
is.na(flora$pollination)=="wind" # [1] FALSE FALSE FALSE FALSE FALSE
The above is a logical error.
is.na(.)
returns logical, which will never equal the string literal"wind"
. If you removeis.na
, then you have some issues, namelyflora$pollination == "wind" # [1] FALSE TRUE NA TRUE FALSE
One common way around this is to use
%in%
instead,flora$pollination %in% "wind" # [1] FALSE TRUE FALSE TRUE FALSE
Similarly for your last part,
is.na(flora$cleistogomay)
.Further, the base code will not do what you want because the LHS replacement is
is.na(flora$mystem)
, so it will not replace what you want it to. You need to replace theflora$msystem
itself.
Perhaps:
flora$msystem <- ifelse(
is.na(flora$msystem) &
flora$cleistogamy %in% "cleistogamy not recorded" &
flora$pollination %in% "wind",
"generally cross", flora$msystem)
flora
# species msystem pollination cleistogamy
# 1 Blysmus compressus generally self entirely cleistogamous
# 2 Dactylis glomerata generally cross wind cleistogamy not recorded
# 3 Daucus carota generally cross <NA> <NA>
# 4 Agrostis curtisii generally cross wind cleistogamy not recorded
# 5 Hornungia petraea generally self insect cleistogamy not recorded
BTW, both of your dplyr code samples run correctly, but I wonder if you're not reassigning the result back to flora
. For instance, contrast this
flora %>%
mutate(msystem = if_else(is.na(msystem) & cleistogamy == "cleistogamy not recorded" & pollination == "wind", "generally cross", msystem))
# species msystem pollination cleistogamy
# 1 Blysmus compressus generally self <NA> entirely cleistogamous
# 2 Dactylis glomerata generally cross wind cleistogamy not recorded
# 3 Daucus carota generally cross <NA> <NA>
# 4 Agrostis curtisii generally cross wind cleistogamy not recorded
# 5 Hornungia petraea generally self insect cleistogamy not recorded
flora
# species msystem pollination cleistogamy
# 1 Blysmus compressus generally self <NA> entirely cleistogamous
# 2 Dactylis glomerata <NA> wind cleistogamy not recorded
# 3 Daucus carota generally cross <NA> <NA>
# 4 Agrostis curtisii <NA> wind cleistogamy not recorded
# 5 Hornungia petraea generally self insect cleistogamy not recorded
(unchanged flora
contents) with
flora <- flora %>%
mutate(msystem = if_else(is.na(msystem) & cleistogamy == "cleistogamy not recorded" & pollination == "wind", "generally cross", msystem))
flora
# species msystem pollination cleistogamy
# 1 Blysmus compressus generally self <NA> entirely cleistogamous
# 2 Dactylis glomerata generally cross wind cleistogamy not recorded
# 3 Daucus carota generally cross <NA> <NA>
# 4 Agrostis curtisii generally cross wind cleistogamy not recorded
# 5 Hornungia petraea generally self insect cleistogamy not recorded