Home > front end >  How to split one column into multiple column with each column having similar values in R
How to split one column into multiple column with each column having similar values in R

Time:08-20

I have a column in my dataset which I want to split into multiple columns. Below is the column from my dataset with 10 rows.

> dput(extract_df$Res)
c("\n- Provided disclaimer, Disconnected", "\n- confirmed the number,\n- pin was verified,\n- Explained reactivation,\n- confirmed the zip-code,\n - process explained,\n- changed the billing cycle,\n- Bill Amount :  60 dollars", 
"\n- confirmed the number", "\n- confirmed the number,\n- pin was verified,\n- Confirmed that the line is suspended due to non-payment", 
"\n- confirmed the number,\n- pin was verified,\n- Explained reactivation,\n- Activation fee charged,\n- Confirmed that the line is suspended due to non-payment,\n- $4 processing fee waived,\n- informed the customer about the payment and changing billing cycle,\n- changed the billing cycle,\n- manually processed the payment,\n- Payment completed successfully,\n- Due Date : next month,\n- Promised back", 
"\n- confirmed the number,\n- pin was verified,\n- Requested for sim number", 
"\n- confirmed the number,\n- Inquired about signal or network status​", 
"\n- confirmed the number,\n- Explained reactivation,\n- Activation fee charged,\n- Confirmed that the line is suspended due to non-payment,\n- confirmed the zip-code,\n- $4 processing fee waived,\n - process explained,\n- Provided different payment options,\n- manually processed the payment,\n- Quickpay link sent,\n- Payment completed successfully,\n- Due Date : 8/31/", 
"\n- confirmed the number,\n- confirmed the zip-code,\n- $4 processing fee charged,\n- updated the address,\n- provided information or transferred the call to Insurance/Assurance Department,\n- manually processed the payment,\n- guided customer to make the payment,\n- Payment completed successfully,\n- Bill Amount :  98 dollars,\n- Due Date : 8/31/", 
"\n- confirmed the number,\n- informed the customer to accept the terms and conditions,\n- confirmed the zip-code,\n- $4 processing fee charged,\n- Successfully restored the cancelled line,\n- informed the customer about the payment and changing billing cycle,\n- Autopay was successfully setup,\n- Quickpay link sent,\n- Payment completed successfully,\n- Due Date : 8/31/"
)

I want to split this column into multiple column. Split should be in such a way that split columns must have similar values from each row, if value not present NA should be assigned

My desired output should be like


col1                                col2                   col3              col4                       col5                        col6                   col7                              col8                                col9                    col10                                                     col11                     col11                                                                col12
Provided disclaimer, Disconnected    NA                    NA                NA                         NA                          NA                     NA                                NA                                  NA                      NA                                                        NA                        NA                                                                   NA
NA                                   confirmed the number  pin was verified  Explained reactivation     confirmed the zip-code      process explained      changed the billing cycle         Bill Amount :  60 dollars           NA                      NA                                                        NA                        NA                                                                   NA
NA                                   confirmed the number  NA                NA                         NA                          NA                     NA                                NA                                  NA                      NA                                                        NA                        NA                                                                   NA
NA                                   confirmed the number  pin was verified  Explained reactivation     NA                          NA                     NA                                NA                                  Activation fee charged  Confirmed that the line is suspended due to non-payment   $4 processing fee waived  informed the customer about the payment and changing billing cycle   NA
NA                                   confirmed the number  pin was verified  NA                         NA                          NA                     NA                                NA                                  NA                      NA                                                        NA                        NA                                                                   Requested for sim number          
:
:

I am very new to R and I would really appreciate any help in this

CodePudding user response:

You could use separate to get somewhere near based on the separator \n-. You won't get your expected output, as the text gives no clues on why e.g. "Provided disclaimer, Disconnected" should be in col1 and "confirmed the number" in col2.

library(dplyr)

textvector |>
  as_tibble() |> 
  separate(value, into = paste0("col", 0:12), sep = "\n-") |> 
  select(-1) |>
  mutate(across(everything(), ~ trimws(trimws(.), whitespace = ",")))

Output:

# A tibble: 10 × 12
   col1                              col2                col3  col4  col5  col6  col7  col8  col9  col10 col11 col12
   <chr>                             <chr>               <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
 1 Provided disclaimer, Disconnected NA                  NA     NA   NA     NA   NA    NA    NA    NA    NA    NA   
 2 confirmed the number              pin was verified    Expl… "con… chan… "Bil… NA    NA    NA    NA    NA    NA   
 3 confirmed the number              NA                  NA     NA   NA     NA   NA    NA    NA    NA    NA    NA   
 4 confirmed the number              pin was verified    Conf…  NA   NA     NA   NA    NA    NA    NA    NA    NA   
 5 confirmed the number              pin was verified    Expl… "Act… Conf… "$4 … info… chan… manu… Paym… Due … Prom…
 6 confirmed the number              pin was verified    Requ…  NA   NA     NA   NA    NA    NA    NA    NA    NA   
 7 confirmed the number              Inquired about sig… NA     NA   NA     NA   NA    NA    NA    NA    NA    NA   
 8 confirmed the number              Explained reactiva… Acti… "Con… conf… "$4 … Prov… manu… Quic… Paym… Due … NA   
 9 confirmed the number              confirmed the zip-… $4 p… "upd… prov… "man… guid… Paym… Bill… Due … NA    NA   
10 confirmed the number              informed the custo… conf… "$4 … Succ… "inf… Auto… Quic… Paym… Due … NA    NA   

CodePudding user response:

I am not sure if this solves all the corner cases, but should work as expected.

library(dplyr)
library(janitor)
library(data.table)
library(stringr)

extract_df <- data.table(extract_df)
dt <- extract_df[,.(Res = str_split_fixed(Res, n = 14, pattern ="\\n-" ))] %>% 
mutate(across(everything(), ~ trimws(trimws(.), whitespace = ",")))%>% na_if(., '') %>% remove_empty(, which = "cols")

The resulting data.frame looks like this:

   # A tibble: 10 x 12
   Res.V2                            Res.V3                                                  Res.V4 Res.V5 Res.V6 Res.V7 Res.V8 Res.V9 Res.V10 Res.V11 Res.V12 Res.V13
   <chr>                             <chr>                                                   <chr>  <chr>  <chr>  <chr>  <chr>  <chr>  <chr>   <chr>   <chr>   <chr>  
 1 Provided disclaimer, Disconnected NA                                                      NA      NA    NA      NA    NA     NA     NA      NA      NA      NA     
 2 confirmed the number              pin was verified                                        Expla~ "conf~ chang~ "Bill~ NA     NA     NA      NA      NA      NA     
 3 confirmed the number              NA                                                      NA      NA    NA      NA    NA     NA     NA      NA      NA      NA     
 4 confirmed the number              pin was verified                                        Confi~  NA    NA      NA    NA     NA     NA      NA      NA      NA     
 5 confirmed the number              pin was verified                                        Expla~ "Acti~ Confi~ "$4 p~ infor~ chang~ manual~ Paymen~ Due Da~ Promis~
 6 confirmed the number              pin was verified                                        Reque~  NA    NA      NA    NA     NA     NA      NA      NA      NA     
 7 confirmed the number              Inquired about signal or network status<U 200B>                 NA      NA    NA      NA    NA     NA     NA      NA      NA      NA     
 8 confirmed the number              Explained reactivation                                  Activ~ "Conf~ confi~ "$4 p~ Provi~ manua~ Quickp~ Paymen~ Due Da~ NA     
 9 confirmed the number              confirmed the zip-code                                  $4 pr~ "upda~ provi~ "manu~ guide~ Payme~ Bill A~ Due Da~ NA      NA     
10 confirmed the number              informed the customer to accept the terms and conditio~ confi~ "$4 p~ Succe~ "info~ Autop~ Quick~ Paymen~ Due Da~ NA      NA     
  • Related