Home > Blockchain >  Using an if/then condition with the |> pipe character
Using an if/then condition with the |> pipe character

Time:01-14

I need to extract the last names of several thousand people. The names are either two or three words long, depending on whether there is a suffix or not. My attack is to count the number of words in each row, then execute a different separate() function depending on how many words there are. The following code does not work but shows my thinking:

customers = data.frame(names=c("Jack Quinn III", "David Powell", "Carrie Green",
           "Steven Miller, Jr.", "Christine Powers", "Amanda Ramirez"))

customers |> 
  mutate(names_count = str_count(names, "\\w ")) |>
  {
  if(names_count == 2,
     separate(name, c("first_name", "last_name") ),
     separate(name, c("first_name", "last_name", "suffix") )
  )
  }

This code cannot possibly work because I'm missing the ability to interpret the error messages. In fact, I'm not sure if the commas are needed in the if statement because there are apparently functions that use both.

My thought was that I could get the names split into columns by doing

df |> 
  mutate() to count words |> 
  separate() to split columns based on count

but I can't get even the simplest if statement to work.

CodePudding user response:

We could use word from stringr instead:

library(stringr)
library(dplyr)

customers |>
    mutate(last_name = word(names, 2))

Output:

               names last_name
1     Jack Quinn III     Quinn
2       David Powell    Powell
3       Carrie Green     Green
4 Steven Miller, Jr.   Miller,
5   Christine Powers    Powers
6     Amanda Ramirez   Ramirez

CodePudding user response:

Using str_extract

library(dplyr)
library(stringr)
 customers %>%
   mutate(last_name = str_extract(names, "^[A-Za-z] \\s ([A-Za-z] )", group = 1))

-output

              names last_name
1     Jack Quinn III     Quinn
2       David Powell    Powell
3       Carrie Green     Green
4 Steven Miller, Jr.    Miller
5   Christine Powers    Powers
6     Amanda Ramirez   Ramirez

CodePudding user response:

You can remove if

customers %>% 
  separate(names, into = c("first_name", "last_name", "suffix"), sep=" ") %>% 
  select(last_name)

If you want to avoid extra packages, you can use R base sub regex:

> sub("[A-Za-z] \\s ([A-Za-z] )\\s?.*", "\\1", customers$names)
[1] "Quinn"   "Powell"  "Green"   "Miller"  "Powers"  "Ramirez"
  •  Tags:  
  • r
  • Related