I would love to replace empty/NA values from a data table with other values that are shown in the variable of the same data table
So I have something like this:
Country | Nationality | currentLoc |
---|---|---|
AR | ES | RU |
empty | empty | ES |
empty | AR | AR |
EN | EN | EN |
EN | ES | ES |
Result should be:
Country | Nationality | currentLoc |
---|---|---|
AR | ES | RU |
ES | empty | ES |
AR | AR | EN |
EN | EN | EN |
EN | ES | ES |
I wrote the following R code:
df1$Country[df1$Country== "" | df1$Country==" "] <- "empty"
df1$Country <-ifelse(df1$Country == "empty", df1$Nationality, ifelse(df1$Nationality == "empty", df1$currentLoc, df1$Country))
However not all "empty" cells are replaced in the "Country" Column.
I tried already to specify the empty spaces in the df1$Country
with NA
, so I have done like
df1$Country[df1$Country== "" | df1$Country==" "] <- NA
df1$Country <-ifelse(is.na(df1$Country), df1$Nationality, ifelse(is.na(df1$Nationality), df1$currentLoc, df1$Country))
That is unfortunately also not working.
Could you help me out in this? I searched for similar questions here, but did not quit get the desired response.
Any help very much appriciated. :)
Cheers
Le1nO
CodePudding user response:
If I understood correctly, you want to relace "empty"
values in the Country
variable with the value of the Nationality
column, and in the case the Nationality
variable is also "empty"
, then replace Country
with the currentLoc
value.
If this is correct, your first attempt is close to what you want, but you need to fix it a little. ifelse
take a test, a value if the test is true and a value if the test is false, so:
test:
df1$Country == "empty"
meaning testing ifCountry
is emptyvalue for true: If
Country
is empty, means that we need to takeNationality
, but this variable can also be empty, so we need to check for it with anotherifelse
, something like this:ifelse(df1$Nationality == "empty", df1$currentLoc, df1$Nationality)
which means, check if nationality is empty and if it is, takecurrentLoc
, if not left theNationality
value.value for false: if
Country
is not empty, then nothing to do, left the country value.
So something like this should work:
df1 <- data.frame(
Country = c("AR", "empty", "empty", "EN", "EN"),
Nationality = c("ES", "empty", "AR", "EN", "ES"),
currentLoc = c("RU", "ES", "AR", "EN", "ES")
)
df1$Country <- ifelse(
test = df1$Country == "empty",
yes = ifelse(df1$Nationality == "empty", df1$currentLoc, df1$Nationality),
no = df1$Country
)
df1
#> Country Nationality currentLoc
#> 1 AR ES RU
#> 2 ES empty ES
#> 3 AR AR AR
#> 4 EN EN EN
#> 5 EN ES ES
Created on 2022-02-23 by the reprex package (v2.0.1)
CodePudding user response:
An approach using the data.table package:
library(data.table)
df1 <- data.table(
Country = c("AR", "" , "" , "EN", "EN"),
Nationality = c("ES", "" , "AR", "EN", "ES"),
currentLoc = c("RU", "ES", "AR", "EN", "ES")
)
First step: convert all blanks to NA:
df1[df1==""] <- NA
Second step: Country
equals to Country
, otherwise Nationality
, otherwise currentLoc
:
> df1 [, Country:=fcoalesce(Country,Nationality, currentLoc)][]
Country Nationality currentLoc
<char> <char> <char>
1: AR ES RU
2: ES <NA> ES
3: AR AR AR
4: EN EN EN
5: EN ES ES
OBS: You may apply a regex for testing for "empty" records (for instance, "", "empty", " ", etc.) by substituting the instruction df1[df1==""] <- NA
for:
regex_empty <- "^ *$|empty" # regex that captures multiple spaces and "empty" records
regex_mapping <- df2[, lapply(.SD,function(x)grepl(regex_empty, x, perl = T))] |> as.matrix()
df2[regex_mapping] <- NA
This code will make it possible to deal with cases such as df2:
> df2 <- data.table(
Country = c("AR", "" , "empty", "EN", "EN"),
Nationality = c("ES", " ", "AR" , "EN", "ES"),
currentLoc = c("RU", "ES" , "AR" , "EN", "ES")
)
>
> regex_empty <- "^ *$|empty"
> regex_mapping <- df2[, lapply(.SD,function(x)grepl(regex_empty, x, perl = T))] |> as.matrix()
> df2[regex_mapping] <- NA
> df2 [, Country:=fcoalesce(Country,Nationality, currentLoc)][]
Country Nationality currentLoc
<char> <char> <char>
1: AR ES RU
2: ES <NA> ES
3: AR AR AR
4: EN EN EN
5: EN ES ES
>
> all.equal(df1, df2)
[1] TRUE