I have a dataframe of character strings that includes NA
's. Here is an altered subpart of it:
subdf
Col1 Col2
1 <NA> <NA>
2 Other Services <NA>
3 Other Services <NA>
4 Other Services Services of lawyers
5 Other Services <NA>
I want to replace the NA
's depending on the cell value to their left/right. I tried to do this the following way:
subdf$Col1[subdf$Col2=="Services of lawyers"]
[1] NA NA
[3] NA "Other Services"
[5] NA
As apparen, I get erratic outputs when looking up the NA
cell. This makes it impossible to adequately replace the appropriate NA
value.
na.omit()
is obviously not applicable, since I am expecting NA
as output in order to replace it.
CodePudding user response:
TL;DR
You could use which
around your logical test to remove the unexpected NA
results of the subsetting operation:
subdf$Col1[which(subdf$Col2=="Services of lawyers")]
Explanation
I think we can replicate your issue like this. Suppose I have a data frame with no NA
values:
df1 <- data.frame(x = c("A", "B", "C"), y = 1:3)
If we want to find the values of column y when x == "A", we do:
df1$y[df1$x == "A"]
#> [1] 1
This gives us the expected result. But look what happens when there are NA
values in x
:
df2 <- data.frame(x = c("A", "B", NA), y = 1:3)
What result would you expect now?
df2$y[df2$x == "A"]
#> [1] 1 NA
This might seem unexpected. After all, we only wanted the values of y when x was "A", but now we have a length-2 result, which neither matches the length of the data frame nor the number of "A"s in our data frame. Why?
It is because we are subsetting by the logical vector df2$x == "A"
, which is:
df2$x == "A"
#> [1] TRUE FALSE NA
So if we subset by this, we will get the first item selected, the second item omitted, but the third item isn't omitted. If you subset by NA
, an NA
is returned. That is why we get two items returned.
The simple way to suppress this is to wrap your logical test in which
, since it will convert to numeric indices and quietly drop NA
values:
df2$y[which(df2$x == "A")]
#> [1] 1
CodePudding user response:
You could try
library(dplyr)
table <- data.frame("Col1"=c(NA, "B", "C"), "Col2"=c("A'", "B'", "C'"))
table %>%
mutate(
Col1 = ifelse(is.na(Col1), stringr::str_extract(Col2, "[A-Z] "), Col1)
)
Edit for new data:
df <- tibble::tribble(~Col1, ~Col2,
"<NA>", "<NA>",
"Other Services", "<NA>",
"Other Services", "<NA>",
"<NA>", "Services of lawyers",
"Other Services", "<NA>"
)
df%>%
mutate(
Col1 = ifelse(Col1 == "<NA>", Col2, Col1)
)