In the following dataframe, I want to separate and put every sub-link (divided by /) into separate columns
df <- data.frame (URL = c("/es/export-340130-from-mx-to-us/market", "/ar/category/access/regulations/requirements"),
X = c(100,200))
URL X
1 /es/export-340130-from-mx-to-us/market-overview 100
2 /ar/category/access/regulations/requirements 200
Like:
CodePudding user response:
We could split the URL at the /
with cSplit
which creates columns in the dataset and then rename the column afterwards (or rename the URL to path before we split)
library(splitstackshape)
library(dplyr)
library(stringr)
df %>%
mutate(URL = trimws(URL, whitespace = "/")) %>%
cSplit("URL", "/") %>%
rename_with(~ str_replace(.x, "URL_", "path")) %>%
relocate(X, .after = last_col())
-output
path1 path2 path3 path4 path5 X
<char> <char> <char> <char> <char> <num>
1: es export-340130-from-mx-to-us market <NA> <NA> 100
2: ar category access regulations requirements 200
Or with unnest
library(tidyr)
df %>%
mutate(URL = strsplit(trimws(URL, whitespace = "/"), "/", fixed = TRUE)) %>%
rename(path = URL) %>%
unnest_wider(path, names_sep = "")
-output
# A tibble: 2 × 6
path1 path2 path3 path4 path5 X
<chr> <chr> <chr> <chr> <chr> <dbl>
1 es export-340130-from-mx-to-us market <NA> <NA> 100
2 ar category access regulations requirements 200
Or in reverse
library(purrr)
df %>%
mutate(URL = strsplit(trimws(URL, whitespace = "/"), "/",
fixed = TRUE)) %>%
rename(path = URL) %>%
mutate(path = map(path, ~ `length<-`(.x, max(lengths(path))) %>%
{.[order(complete.cases(.))]})) %>%
unnest_wider(path, names_sep = "")
-output
# A tibble: 2 × 6
path1 path2 path3 path4 path5 X
<chr> <chr> <chr> <chr> <chr> <dbl>
1 <NA> <NA> es export-340130-from-mx-to-us market 100
2 ar category access regulations requirements 200
CodePudding user response:
in base R you would do:
a <- read.table(text=trimws(df$URL,whitespace = "/"), sep = '/', fill =TRUE )
names(a) <- paste0("path", seq(ncol(a)))
cbind(a, df['X'])
path1 path2 path3 path4 path5 X
1 es export-340130-from-mx-to-us market 100
2 ar category access regulations requirements 200
CodePudding user response:
Using only tidyverse tools:
library(stringr)
library(dplyr)
library(tidyr)
df |>
mutate(URL = trimws(URL,which='left',whitespace = '/')) |>
separate(col = URL,
into = paste0('path',c(1:5)),
sep = '/')
Output:
path1 path2 path3 path4 path5 x
1 es export-340130-from-mx-to-us market-overview <NA> <NA> 100
2 ar category access regulations requirements 200