Using separate and mutate functions-CodePudding

I am new to R and trying to mutate the following character variable "Tax.Rate..." into four different columns (i.e., CGST, SGST, UTGST, and IGST) with tax rates applicable to that head under that column. Example of dataset shown below:

df # A tibble: 3 x 1 Tax.Rate....

1 "CGST 2.5% SGST 2.5%" 2 "CGST 6% UTGST 6%"
3 "IGST 12% "

I have tried using 'separate' and 'mutate' functions with little success

Any guidance would be appreciated

CodePudding user response：

I'm sure this can be done concisely in base R as well, but here's a tidyverse approach where I first split the data into a new row at each plus, then trim the extra spaces, then split into two columns.

library(tidyverse)
df <- data.frame(Tax.Rate = c("CGST 2.5%   SGST 2.5%", "CGST 6%   UTGST 6%", "IGST 12% "))

df %>%
  mutate(orig_row = row_number()) %>% # optional, for later tracking
  separate_rows(Tax.Rate, sep = "\\ ") %>%
  mutate(Tax.Rate = str_trim(Tax.Rate)) %>%
  separate(Tax.Rate, c("group", "rate"), extra = "merge", remove = FALSE)

# A tibble: 5 × 4
  Tax.Rate  group rate  orig_row
  <chr>     <chr> <chr>    <int>
1 CGST 2.5% CGST  2.5%         1
2 SGST 2.5% SGST  2.5%         1
3 CGST 6%   CGST  6%           2
4 UTGST 6%  UTGST 6%           2
5 IGST 12%  IGST  12%          3

This will produce a "long" shaped table, but if you want it "wide" with separate columns for each group (jurisdiction?) then you could add the following:

[from the end of the "separate()" line] %>%
select(-Tax.Rate) %>%
pivot_wider(names_from = group, values_from = rate)

for this result

# A tibble: 3 × 5
  orig_row CGST  SGST  UTGST IGST 
     <int> <chr> <chr> <chr> <chr>
1        1 2.5%  2.5%  NA    NA   
2        2 6%    NA    6%    NA   
3        3 NA    NA    NA    12%

CodePudding user response：

We could:

Use separate_rows separating by using \\ to escape the special character
then str_trim to remove starting space etc...
separate this column by " " 4.group_by and add id to avoid nested output
pivot_wider

library(dplyr)
library(tidyr)
library(stringr)
df %>% 
  separate_rows(Tax.Rate, sep = "\\ ") %>% 
  mutate(Tax.Rate = str_trim(Tax.Rate)) %>% 
  separate(Tax.Rate, c("name", "value"), sep = " ") %>% 
  group_by(name) %>% 
  mutate(id = row_number()) %>% 
  pivot_wider(
    names_from = name, 
    values_from = value
  ) %>% 
  select(-id)

  CGST  SGST  UTGST IGST 
  <chr> <chr> <chr> <chr>
1 2.5%  2.5%  6%    12%  
2 6%    NA    NA    NA

data:

structure(list(Tax.Rate = c("CGST 2.5%   SGST 2.5%", "CGST 6%   UTGST 6%", 
"IGST 12%")), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, 
-3L))