I have this two data.frames:
df1 = data.frame(
"scientific_name"=c("Anchietea ballardii","Anemia patens","Cololobus argenteus"),
"threat_status"=c("VU","EN","EN")
)
> df1
scientific_name threat_status
1 Anchietea ballardii VU
2 Anemia patens EN
3 Cololobus argenteus EN
>
df2 = data.frame(
"scientific_name"=c("Anchietea ballardii","Anemia patens","Cololobus argenteus","Coleocephalocereus pluricostatus", "Dyckia bracteata", "Merianthera parvifolia", "Kielmeyera rupestris"),
"threat_status"=c(NA)
)
> df2
scientific_name threat_status
1 Anchietea ballardii NA
2 Anemia patens NA
3 Cololobus argenteus NA
4 Coleocephalocereus pluricostatus NA
5 Dyckia bracteata NA
6 Merianthera parvifolia NA
7 Kielmeyera rupestris NA
>
I need to fill the df2$threat_status with the correspondent df1$threat_status, but when there is not any value for the column df2$threat_status it can be filled with "LC". I'm trying with sqldf INSERT TO, but it's not working. I's like to have a solution in plain R, something tells me it would be more elegant. Somebody could help? Thanks a lot!
CodePudding user response:
In base R, you can do:
df2 = merge(df2[,1,drop=F],df1, by="scientific_name", all.x=T)
df2[is.na(df2$threat_status), 2] <- "LC"
Output:
scientific_name threat_status
1 Anchietea ballardii VU
2 Anemia patens EN
3 Coleocephalocereus pluricostatus LC
4 Cololobus argenteus EN
5 Dyckia bracteata LC
6 Kielmeyera rupestris LC
7 Merianthera parvifolia LC
Another option in your example is to simply set all the threat_status
values in df2
to "LC"
, and then just row bind df1
to the rows in df2
where df1$scientfic_name
does not appear:
df2$threat_status="LC"
rbind(df1,df2[!df2$scientific_name %in% df1$scientific_name,])
Output:
scientific_name threat_status
1 Anchietea ballardii VU
2 Anemia patens EN
3 Cololobus argenteus EN
4 Coleocephalocereus pluricostatus LC
5 Dyckia bracteata LC
6 Merianthera parvifolia LC
7 Kielmeyera rupestris LC