I have a column in a dataframe with scraped prices like this:
prices
$1,50 $1,20
$1,50
$1,75 $1,25 $1,35
In summary in each column I can have many prices. What I would like is to obtain different columns that are separated from $, this is what I need based on the example that I put.
prices price1 price2 price3
$1,50 $1,20 1,50 1,20 NA
$1,50 1,50 NA NA
$1,75 $1,25 $1,35 1,75 1,25 1,35
I have tried the following but neither option does what I need. Help
str_split(prices, pattern = '[$]') # I get a column with values like this c("", "1,50")
separate(prices, sep = '[$]', into = c("price1", "price2"), remove = FALSE)
#Price1 is created empty and I am trying to use it in a function,
#so in some dataframes the number of prices can vary.
CodePudding user response:
One approach using dplyr
:
df %>%
rowwise() %>%
mutate(price = list(gsub("$", "",strsplit(prices, " ")[[1]],fixed = T))) %>%
unnest_wider(price,names_sep = "")
Output:
prices price1 price2 price3
<chr> <chr> <chr> <chr>
1 $1,50 $1,20 1,50 1,20 NA
2 $1,50 1,50 NA NA
3 $1,75 $1,25 $1,35 1,75 1,25 1,35
Input:
df = structure(list(prices = c("$1,50 $1,20", "$1,50", "$1,75 $1,25 $1,35"
)), class = "data.frame", row.names = c(NA, -3L))
CodePudding user response:
If your default locale has comma as the decimal separator, then:
library(tidyverse)
options("readr.default_locale" = readr::locale(decimal_mark = ","))
df <- tibble(prices =
c("$1,50 $1,20",
"$1,50",
"$1,75 $1,25 $1,35"))
df |>
mutate(prices = prices |>
str_split(" ") |>
map( ~ str_remove(., "\\$"))) |>
unnest_wider(prices) |>
mutate(across(.fns = readr::parse_number))
#> New names:
#> New names:
#> New names:
#> • `` -> `...1`
#> • `` -> `...2`
#> # A tibble: 3 × 3
#> ...1 ...2 ...3
#> <dbl> <dbl> <dbl>
#> 1 1.5 1.2 NA
#> 2 1.5 NA NA
#> 3 1.75 1.25 1.35
Otherwise:
df |>
mutate(prices = prices |>
str_split(" ") |>
map( ~ str_remove(., "\\$"))) |>
unnest_wider(prices) |>
mutate(across(.fns = ~ readr::parse_number(., locale = readr::locale(decimal_mark = ","))))
#> New names:
#> New names:
#> New names:
#> • `` -> `...1`
#> • `` -> `...2`
#> # A tibble: 3 × 3
#> ...1 ...2 ...3
#> <dbl> <dbl> <dbl>
#> 1 1.5 1.2 NA
#> 2 1.5 NA NA
#> 3 1.75 1.25 1.35
CodePudding user response:
With cSplit
:
library(splitstackshape)
s <- cSplit(df, "prices", "$", type.convert = T)[, -1]
df[, paste0("price", 1:ncol(s))] <- s
# prices price1 price2 price3
#1 $1,50 $1,20 1,50 1,20 <NA>
#2 $1,50 1,50 <NA> <NA>
#3 $1,75 $1,25 $1,35 1,75 1,25 1,35
CodePudding user response:
Add a prices_text column, separate it into rows, remove rows with "" entries, add a column n which counts the rows within prices_next to use as names and convert to wide form.
library(dplyr)
library(tidyr)
DF %>%
mutate(prices_text = prices) %>%
separate_rows(prices_text, sep = "[$ ] ") %>%
filter(prices_text != "") %>%
group_by(prices) %>%
mutate(n = paste0("prices", 1:n())) %>%
ungroup %>%
pivot_wider(id_cols = prices, values_from = prices_text, names_from = n)
giving:
# A tibble: 3 × 4
prices prices1 prices2 prices3
<chr> <chr> <chr> <chr>
1 $1,50 $1,20 1,50 1,20 <NA>
2 $1,50 1,50 <NA> <NA>
3 $1,75 $1,25 $1,35 1,75 1,25 1,35