I have a column in my dataset which I want to split into multiple columns. Below is the column from my dataset with 10 rows.
> dput(extract_df$Res)
c("\n- Provided disclaimer, Disconnected", "\n- confirmed the number,\n- pin was verified,\n- Explained reactivation,\n- confirmed the zip-code,\n - process explained,\n- changed the billing cycle,\n- Bill Amount : 60 dollars",
"\n- confirmed the number", "\n- confirmed the number,\n- pin was verified,\n- Confirmed that the line is suspended due to non-payment",
"\n- confirmed the number,\n- pin was verified,\n- Explained reactivation,\n- Activation fee charged,\n- Confirmed that the line is suspended due to non-payment,\n- $4 processing fee waived,\n- informed the customer about the payment and changing billing cycle,\n- changed the billing cycle,\n- manually processed the payment,\n- Payment completed successfully,\n- Due Date : next month,\n- Promised back",
"\n- confirmed the number,\n- pin was verified,\n- Requested for sim number",
"\n- confirmed the number,\n- Inquired about signal or network status",
"\n- confirmed the number,\n- Explained reactivation,\n- Activation fee charged,\n- Confirmed that the line is suspended due to non-payment,\n- confirmed the zip-code,\n- $4 processing fee waived,\n - process explained,\n- Provided different payment options,\n- manually processed the payment,\n- Quickpay link sent,\n- Payment completed successfully,\n- Due Date : 8/31/",
"\n- confirmed the number,\n- confirmed the zip-code,\n- $4 processing fee charged,\n- updated the address,\n- provided information or transferred the call to Insurance/Assurance Department,\n- manually processed the payment,\n- guided customer to make the payment,\n- Payment completed successfully,\n- Bill Amount : 98 dollars,\n- Due Date : 8/31/",
"\n- confirmed the number,\n- informed the customer to accept the terms and conditions,\n- confirmed the zip-code,\n- $4 processing fee charged,\n- Successfully restored the cancelled line,\n- informed the customer about the payment and changing billing cycle,\n- Autopay was successfully setup,\n- Quickpay link sent,\n- Payment completed successfully,\n- Due Date : 8/31/"
)
I want to split this column into multiple column. Split should be in such a way that split columns must have similar values from each row, if value not present NA should be assigned
My desired output should be like
col1 col2 col3 col4 col5 col6 col7 col8 col9 col10 col11 col11 col12
Provided disclaimer, Disconnected NA NA NA NA NA NA NA NA NA NA NA NA
NA confirmed the number pin was verified Explained reactivation confirmed the zip-code process explained changed the billing cycle Bill Amount : 60 dollars NA NA NA NA NA
NA confirmed the number NA NA NA NA NA NA NA NA NA NA NA
NA confirmed the number pin was verified Explained reactivation NA NA NA NA Activation fee charged Confirmed that the line is suspended due to non-payment $4 processing fee waived informed the customer about the payment and changing billing cycle NA
NA confirmed the number pin was verified NA NA NA NA NA NA NA NA NA Requested for sim number
:
:
I am very new to R and I would really appreciate any help in this
CodePudding user response:
You could use separate
to get somewhere near based on the separator \n-
. You won't get your expected output, as the text gives no clues on why e.g. "Provided disclaimer, Disconnected" should be in col1 and "confirmed the number" in col2.
library(dplyr)
textvector |>
as_tibble() |>
separate(value, into = paste0("col", 0:12), sep = "\n-") |>
select(-1) |>
mutate(across(everything(), ~ trimws(trimws(.), whitespace = ",")))
Output:
# A tibble: 10 × 12
col1 col2 col3 col4 col5 col6 col7 col8 col9 col10 col11 col12
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 Provided disclaimer, Disconnected NA NA NA NA NA NA NA NA NA NA NA
2 confirmed the number pin was verified Expl… "con… chan… "Bil… NA NA NA NA NA NA
3 confirmed the number NA NA NA NA NA NA NA NA NA NA NA
4 confirmed the number pin was verified Conf… NA NA NA NA NA NA NA NA NA
5 confirmed the number pin was verified Expl… "Act… Conf… "$4 … info… chan… manu… Paym… Due … Prom…
6 confirmed the number pin was verified Requ… NA NA NA NA NA NA NA NA NA
7 confirmed the number Inquired about sig… NA NA NA NA NA NA NA NA NA NA
8 confirmed the number Explained reactiva… Acti… "Con… conf… "$4 … Prov… manu… Quic… Paym… Due … NA
9 confirmed the number confirmed the zip-… $4 p… "upd… prov… "man… guid… Paym… Bill… Due … NA NA
10 confirmed the number informed the custo… conf… "$4 … Succ… "inf… Auto… Quic… Paym… Due … NA NA
CodePudding user response:
I am not sure if this solves all the corner cases, but should work as expected.
library(dplyr)
library(janitor)
library(data.table)
library(stringr)
extract_df <- data.table(extract_df)
dt <- extract_df[,.(Res = str_split_fixed(Res, n = 14, pattern ="\\n-" ))] %>%
mutate(across(everything(), ~ trimws(trimws(.), whitespace = ",")))%>% na_if(., '') %>% remove_empty(, which = "cols")
The resulting data.frame looks like this:
# A tibble: 10 x 12
Res.V2 Res.V3 Res.V4 Res.V5 Res.V6 Res.V7 Res.V8 Res.V9 Res.V10 Res.V11 Res.V12 Res.V13
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 Provided disclaimer, Disconnected NA NA NA NA NA NA NA NA NA NA NA
2 confirmed the number pin was verified Expla~ "conf~ chang~ "Bill~ NA NA NA NA NA NA
3 confirmed the number NA NA NA NA NA NA NA NA NA NA NA
4 confirmed the number pin was verified Confi~ NA NA NA NA NA NA NA NA NA
5 confirmed the number pin was verified Expla~ "Acti~ Confi~ "$4 p~ infor~ chang~ manual~ Paymen~ Due Da~ Promis~
6 confirmed the number pin was verified Reque~ NA NA NA NA NA NA NA NA NA
7 confirmed the number Inquired about signal or network status<U 200B> NA NA NA NA NA NA NA NA NA NA
8 confirmed the number Explained reactivation Activ~ "Conf~ confi~ "$4 p~ Provi~ manua~ Quickp~ Paymen~ Due Da~ NA
9 confirmed the number confirmed the zip-code $4 pr~ "upda~ provi~ "manu~ guide~ Payme~ Bill A~ Due Da~ NA NA
10 confirmed the number informed the customer to accept the terms and conditio~ confi~ "$4 p~ Succe~ "info~ Autop~ Quick~ Paymen~ Due Da~ NA NA