Home > Blockchain >  In R is there a way to create a new column based on column names and values? Tidy solution welcome
In R is there a way to create a new column based on column names and values? Tidy solution welcome

Time:05-19

Specifically I have an untidy data.frame with subspecies varieties in separate columns, like this;

# Data
Genus<- c("Metrosideros", "Gahnia", "Acacia")
Species<- c("polymorpha", "aspera", "koa")
Subspecies<- c("", "globosa","")
Variety<- c("glaberrima", "", "")
df<-data.frame(Genus, Species, Subspecies, Variety)

But I want a new column that looks like this;

df$Sciname<- c("Metrosideros polymorpha var. glaberrima",
               "Gahnia aspera subsp. globosa",
               "Acacia koa")

There is probably a clever solution using paste() and ifelse() but I cannot figure it out. If there is a tidyverse (dplyr) solution that is welcome. Thanks for any help!

CodePudding user response:

You can get there with paste() and a little bit of indexing.

with(df, paste(
  Genus,
  Species,
  c("", "subsp.")[(Subspecies != "")   1],
  Subspecies,
  c("", "var.")[(Variety != "")   1],
  Variety
))

[1] "Metrosideros polymorpha   var. glaberrima" "Gahnia aspera subsp. globosa   "           "Acacia koa    "

You can use stringr::str_squish() on the result to get rid of unwanted spaces which will give:

[1] "Metrosideros polymorpha var. glaberrima" "Gahnia aspera subsp. globosa"            "Acacia koa"  

CodePudding user response:

Here's another option with tidyverse, where we can add the additional strings to the Subspecies and Variety columns, then we can use unite to combine all columns. Then, we can clean up the Sciname column then rejoin to the original dataframe.

library(tidyverse)

df %>%
  mutate(Subspecies = ifelse(Subspecies != "", paste0("subsp. ", Subspecies), Subspecies),
         Variety = ifelse(Variety != "", paste0("var. ", Variety), Variety)) %>%
  unite("Sciname", Genus:Variety, sep = " ", remove = FALSE, na.rm = T) %>%
  select(Sciname) %>%
  mutate(Sciname = trimws(Sciname)) %>%
  bind_cols(df, .)

Output

         Genus    Species Subspecies    Variety                                  Sciname
1 Metrosideros polymorpha            glaberrima Metrosideros polymorpha  var. glaberrima
2       Gahnia     aspera    globosa                        Gahnia aspera subsp. globosa
3       Acacia        koa                                                     Acacia koa
  • Related