Home > OS >  Fill a data.frame column with values from other data.frame column with a condition R
Fill a data.frame column with values from other data.frame column with a condition R

Time:09-18

I have this two data.frames:

  df1 = data.frame(
    "scientific_name"=c("Anchietea ballardii","Anemia patens","Cololobus argenteus"),
    "threat_status"=c("VU","EN","EN")
  )

> df1
      scientific_name threat_status
1 Anchietea ballardii            VU
2       Anemia patens            EN
3 Cololobus argenteus            EN
> 


  df2 = data.frame(
    "scientific_name"=c("Anchietea ballardii","Anemia patens","Cololobus argenteus","Coleocephalocereus pluricostatus", "Dyckia bracteata", "Merianthera parvifolia", "Kielmeyera rupestris"),
    "threat_status"=c(NA)
  )

> df2
                     scientific_name threat_status
1                Anchietea ballardii            NA
2                      Anemia patens            NA
3                Cololobus argenteus            NA
4   Coleocephalocereus pluricostatus            NA
5                   Dyckia bracteata            NA
6             Merianthera parvifolia            NA
7               Kielmeyera rupestris            NA
>

I need to fill the df2$threat_status with the correspondent df1$threat_status, but when there is not any value for the column df2$threat_status it can be filled with "LC". I'm trying with sqldf INSERT TO, but it's not working. I's like to have a solution in plain R, something tells me it would be more elegant. Somebody could help? Thanks a lot!

CodePudding user response:

In base R, you can do:

df2 = merge(df2[,1,drop=F],df1, by="scientific_name", all.x=T)
df2[is.na(df2$threat_status), 2] <- "LC"

Output:

                   scientific_name threat_status
1              Anchietea ballardii            VU
2                    Anemia patens            EN
3 Coleocephalocereus pluricostatus            LC
4              Cololobus argenteus            EN
5                 Dyckia bracteata            LC
6             Kielmeyera rupestris            LC
7           Merianthera parvifolia            LC

Another option in your example is to simply set all the threat_status values in df2 to "LC", and then just row bind df1 to the rows in df2 where df1$scientfic_name does not appear:

df2$threat_status="LC"
rbind(df1,df2[!df2$scientific_name %in% df1$scientific_name,])

Output:

                   scientific_name threat_status
1              Anchietea ballardii            VU
2                    Anemia patens            EN
3              Cololobus argenteus            EN
4 Coleocephalocereus pluricostatus            LC
5                 Dyckia bracteata            LC
6           Merianthera parvifolia            LC
7             Kielmeyera rupestris            LC
  • Related