I am new to R and trying to mutate the following character variable "Tax.Rate..." into four different columns (i.e., CGST, SGST, UTGST, and IGST) with tax rates applicable to that head under that column. Example of dataset shown below:
df # A tibble: 3 x 1 Tax.Rate....
1 "CGST 2.5% SGST 2.5%" 2 "CGST 6% UTGST 6%"
3 "IGST 12% "
I have tried using 'separate' and 'mutate' functions with little success
Any guidance would be appreciated
CodePudding user response:
I'm sure this can be done concisely in base R as well, but here's a tidyverse approach where I first split the data into a new row at each plus, then trim the extra spaces, then split into two columns.
library(tidyverse)
df <- data.frame(Tax.Rate = c("CGST 2.5% SGST 2.5%", "CGST 6% UTGST 6%", "IGST 12% "))
df %>%
mutate(orig_row = row_number()) %>% # optional, for later tracking
separate_rows(Tax.Rate, sep = "\\ ") %>%
mutate(Tax.Rate = str_trim(Tax.Rate)) %>%
separate(Tax.Rate, c("group", "rate"), extra = "merge", remove = FALSE)
# A tibble: 5 × 4
Tax.Rate group rate orig_row
<chr> <chr> <chr> <int>
1 CGST 2.5% CGST 2.5% 1
2 SGST 2.5% SGST 2.5% 1
3 CGST 6% CGST 6% 2
4 UTGST 6% UTGST 6% 2
5 IGST 12% IGST 12% 3
This will produce a "long" shaped table, but if you want it "wide" with separate columns for each group (jurisdiction?) then you could add the following:
[from the end of the "separate()" line] %>%
select(-Tax.Rate) %>%
pivot_wider(names_from = group, values_from = rate)
for this result
# A tibble: 3 × 5
orig_row CGST SGST UTGST IGST
<int> <chr> <chr> <chr> <chr>
1 1 2.5% 2.5% NA NA
2 2 6% NA 6% NA
3 3 NA NA NA 12%
CodePudding user response:
We could:
- Use
separate_rows
separating by\\
to escape the special character - then
str_trim
to remove starting space etc... separate
this column by" "
4.group_by
and addid
to avoid nested outputpivot_wider
library(dplyr)
library(tidyr)
library(stringr)
df %>%
separate_rows(Tax.Rate, sep = "\\ ") %>%
mutate(Tax.Rate = str_trim(Tax.Rate)) %>%
separate(Tax.Rate, c("name", "value"), sep = " ") %>%
group_by(name) %>%
mutate(id = row_number()) %>%
pivot_wider(
names_from = name,
values_from = value
) %>%
select(-id)
CGST SGST UTGST IGST
<chr> <chr> <chr> <chr>
1 2.5% 2.5% 6% 12%
2 6% NA NA NA
data:
structure(list(Tax.Rate = c("CGST 2.5% SGST 2.5%", "CGST 6% UTGST 6%",
"IGST 12%")), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-3L))