I have this df
:
data1 <- structure(list(attr = c("kind1", "kind2", "kind3", "price1",
"price2", "packing1", "weight1", "weight2", "calorie1"), coef = c(-1.08908045977012,
-0.732758620689656, -0.922413793103449, -0.570881226053641, 0.118773946360153,
-0.0287356321839081, -0.168582375478927, 0.173371647509578, -0.646551724137931
), pval = c(0.0000000461586619475345, 0.000225855110699109, 0.00000354973103147522,
0.000189625500287816, 0.506777189443937, 0.801713589134903, 0.269271977099465,
0.33257496253009, 0.0000000192904668116847)), row.names = c(NA,
-9L), class = "data.frame")
# attr coef pval
#1 kind1 -1.08908046 0.00000004615866
#2 kind2 -0.73275862 0.00022585511070
#3 kind3 -0.92241379 0.00000354973103
#4 price1 -0.57088123 0.00018962550029
#5 price2 0.11877395 0.50677718944394
#6 packing1 -0.02873563 0.80171358913490
#7 weight1 -0.16858238 0.26927197709946
#8 weight2 0.17337165 0.33257496253009
#9 calorie1 -0.64655172 0.00000001929047
I'm trying to add by groups according to a regex that identifies similar words up to a certain point, in this case, until a number appears.
For example, in the case of my variables, there would be 5 groups:
kind
Total = kind sum
price
Total = price sum
packing
Total= packing sum
weight
Total = weight sum
calorie
Total = calorie sum
I made this code, but I don't know how to position this regex or how to create it. I tried using stringr
but I couldn't do what I want:
data1 %>%
dplyr::arrange(attr) %>%
split(f = .[,"attr"]) %>%
purrr::map_df(., janitor::adorn_totals)
# attr coef pval
# calorie1 -0.64655172 0.00000001929047
# Total -0.64655172 0.00000001929047
# kind1 -1.08908046 0.00000004615866
# Total -1.08908046 0.00000004615866
# kind2 -0.73275862 0.00022585511070
# Total -0.73275862 0.00022585511070
# kind3 -0.92241379 0.00000354973103
# Total -0.92241379 0.00000354973103
# packing1 -0.02873563 0.80171358913490
# Total -0.02873563 0.80171358913490
# price1 -0.57088123 0.00018962550029
# Total -0.57088123 0.00018962550029
# price2 0.11877395 0.50677718944394
# Total 0.11877395 0.50677718944394
# weight1 -0.16858238 0.26927197709946
# Total -0.16858238 0.26927197709946
# weight2 0.17337165 0.33257496253009
# Total 0.17337165 0.33257496253009
It sums individual rows as groups differ by number. I need a regex that captures this:
kind
price
packing
weight
calorie
That is, to capture the letters until a number appears there.
CodePudding user response:
You can create a grouping variable by removing the digits from the attr
variable, and then use group_modify
:
data1 %>%
group_by(grp = str_remove_all(attr, "[0-9]")) %>%
group_modify(janitor::adorn_totals, where = "row") %>%
ungroup() %>%
select(-grp)
# # A tibble: 14 × 3
# attr coef pval
# <chr> <dbl> <dbl>
# 1 calorie1 -0.647 0.0000000193
# 2 Total -0.647 0.0000000193
# 3 kind1 -1.09 0.0000000462
# 4 kind2 -0.733 0.000226
# 5 kind3 -0.922 0.00000355
# 6 Total -2.74 0.000229
# 7 packing1 -0.0287 0.802
# 8 Total -0.0287 0.802
# 9 price1 -0.571 0.000190
# 10 price2 0.119 0.507
# 11 Total -0.452 0.507
# 12 weight1 -0.169 0.269
# 13 weight2 0.173 0.333
# 14 Total 0.00479 0.602
CodePudding user response:
Something like this:
We could use group_split()
after extract the words to identify. Then we get a list. Here we now can iterate with map_df
the function adorn_totals
:
library(tidyverse)
library(janitor)
data1 %>%
group_split(id=str_extract(attr, '[A-Za-z] ')) %>%
map_dfr(., adorn_totals) %>%
select(-id) %>%
as_tibble()
attr coef pval
<chr> <dbl> <dbl>
1 calorie1 -0.647 0.0000000193
2 Total -0.647 0.0000000193
3 kind1 -1.09 0.0000000462
4 kind2 -0.733 0.000226
5 kind3 -0.922 0.00000355
6 Total -2.74 0.000229
7 packing1 -0.0287 0.802
8 Total -0.0287 0.802
9 price1 -0.571 0.000190
10 price2 0.119 0.507
11 Total -0.452 0.507
12 weight1 -0.169 0.269
13 weight2 0.173 0.333
14 Total 0.00479 0.602