Home > Mobile >  R: How to separate string and put in different columns
R: How to separate string and put in different columns

Time:10-21

In the following dataframe, I want to separate and put every sub-link (divided by /) into separate columns

df <- data.frame (URL  = c("/es/export-340130-from-mx-to-us/market", "/ar/category/access/regulations/requirements"),
                  X = c(100,200))

                                              URL   X
1 /es/export-340130-from-mx-to-us/market-overview 100
2    /ar/category/access/regulations/requirements 200

Like:

enter image description here

CodePudding user response:

We could split the URL at the / with cSplit which creates columns in the dataset and then rename the column afterwards (or rename the URL to path before we split)

library(splitstackshape)
library(dplyr)
library(stringr)
df %>% 
  mutate(URL = trimws(URL, whitespace = "/")) %>% 
  cSplit("URL", "/") %>%
  rename_with(~ str_replace(.x, "URL_", "path")) %>%
  relocate(X, .after = last_col())

-output

 path1                       path2  path3       path4        path5     X
   <char>                      <char> <char>      <char>       <char> <num>
1:     es export-340130-from-mx-to-us market        <NA>         <NA>   100
2:     ar                    category access regulations requirements   200

Or with unnest

library(tidyr)
df %>%
   mutate(URL = strsplit(trimws(URL, whitespace = "/"), "/", fixed = TRUE)) %>%
   rename(path = URL) %>% 
   unnest_wider(path, names_sep = "")

-output

# A tibble: 2 × 6
  path1 path2                       path3  path4       path5            X
  <chr> <chr>                       <chr>  <chr>       <chr>        <dbl>
1 es    export-340130-from-mx-to-us market <NA>        <NA>           100
2 ar    category                    access regulations requirements   200

Or in reverse

library(purrr)
df %>%
   mutate(URL = strsplit(trimws(URL, whitespace = "/"), "/",
    fixed = TRUE)) %>% 
  rename(path = URL) %>% 
  mutate(path = map(path, ~ `length<-`(.x, max(lengths(path))) %>% 
      {.[order(complete.cases(.))]})) %>% 
  unnest_wider(path, names_sep = "")

-output

# A tibble: 2 × 6
  path1 path2    path3  path4                       path5            X
  <chr> <chr>    <chr>  <chr>                       <chr>        <dbl>
1 <NA>  <NA>     es     export-340130-from-mx-to-us market         100
2 ar    category access regulations                 requirements   200

CodePudding user response:

in base R you would do:

a <- read.table(text=trimws(df$URL,whitespace = "/"), sep = '/', fill =TRUE )
names(a) <- paste0("path", seq(ncol(a)))
cbind(a, df['X'])

path1                       path2  path3       path4        path5   X
1    es export-340130-from-mx-to-us market                          100
2    ar                    category access regulations requirements 200

CodePudding user response:

Using only tidyverse tools:

library(stringr)
library(dplyr)
library(tidyr)

df |>
   mutate(URL = trimws(URL,which='left',whitespace = '/')) |>
   separate(col = URL,
            into = paste0('path',c(1:5)),
            sep = '/')

Output:

  path1                       path2           path3       path4        path5   x
1    es export-340130-from-mx-to-us market-overview        <NA>         <NA> 100
2    ar                    category          access regulations requirements 200
  • Related