Home > front end >  split a column in two existing columns if a condition is matched in R
split a column in two existing columns if a condition is matched in R

Time:05-11

I have a df like this:

df <- data.frame(id=c("j1", "j2", "j3/j9", "j5", "j2/j8", "j3/j4"), dad=c("j10", "j11", "", "j13", "", ""), mom=c("k2", "k4", "", "k6", "", ""))

I am trying to split only those cells that contain a slash "/" in the column "id". I want to get the split string in the existing columns "dad" and "mom". The desired output would be this:

df2 <- data.frame(id=c("j1", "j2", "j3/j9", "j5", "j2/j8", "j3/j4"), dad=c("j10", "j11", "j3", "j13", "j2", "j3"), mom=c("k2", "k4", "j9", "k6", "j8", "j4"))

I am trying this code:

df3 <- tidyr::separate(data = df, col = "id", into = c("dad", "mom"), sep = "/")

but that split the entire column "id" in two new columns. Any idea how to solve this?

CodePudding user response:

Here is one method with coalesce after separateing - convert the blanks ("") to NA (na_if), separate the 'id' into 'dad2', 'mom2', columns, loop across the 'dad', 'mom' columns and coalesce with the corresponding 'dad2', 'mom2' columns

library(dplyr)
library(tidyr)
library(stringr)
df %>% 
  na_if("") %>% 
  separate(id, into = c("dad2", "mom2"), sep = "/", fill = "right", 
       remove = FALSE) %>% 
  mutate(across(dad:mom, ~ coalesce(.x, get(str_c(cur_column(), 
          2)))), .keep = "unused")

-output

     id dad mom
1    j1 j10  k2
2    j2 j11  k4
3 j3/j9  j3  j9
4    j5 j13  k6
5 j2/j8  j2  j8
6 j3/j4  j3  j4

Or slightly more useful will be across2 from dplyover

library(dplyover)
df %>%
   na_if("") %>% 
   separate(id, into = c("dad2", "mom2"), sep = "/", fill = "right", 
       remove = FALSE) %>% 
   mutate(across2(dad:mom, dad2:mom2, coalesce, .names = "{xcol}")) %>%
   select(names(df))

CodePudding user response:

You can use grep to get the lines with /, than use strsplit and insert the result back in df.

i <- grep("/", df$id)
. <- do.call(rbind, strsplit(df$id[i], "/", TRUE))
df$dad[i] <- .[,1]
df$mom[i] <- .[,2]
df
##     id dad mom
#1    j1 j10  k2
#2    j2 j11  k4
#3 j3/j9  j3  j9
#4    j5 j13  k6
#5 j2/j8  j2  j8
#6 j3/j4  j3  j4

Or using sub.

i <- grep("/", df$id)
df$dad[i] <- sub("/.*", "", df$id[i])
df$mom[i] <- sub(".*/", "", df$id[i])

CodePudding user response:

You could use:

library(dplyr)
library(stringr)

df %>% 
  mutate(dad = if_else(str_detect(id, "/"), str_extract(id, ".*(?=/)"), dad),
         mom = if_else(str_detect(id, "/"), str_extract(id, "(?<=/).*"), mom))

This returns

     id dad mom
1    j1 j10  k2
2    j2 j11  k4
3 j3/j9  j3  j9
4    j5 j13  k6
5 j2/j8  j2  j8
6 j3/j4  j3  j4
  • Related