I have some random effect coefficients extracted from a R model object. For a random intercept, they look like this:
xx <- data.frame(
`Estimate.Intercept` = c(-0.1, -0.2),
`Est.Error.Intercept` = c(0.7, 0.8),
`Q5.Intercept` = c(-1.5, -1.4),
`Q95.Intercept` = c(0.7, 0.8)
)
I'm formatting the data for a .csv
report and trying to generate a 'long' data.frame/tibble with term_type
taken from the first part of the column name and term
taken from the second part. It mostly works with pivot_longer
from the tidyr
package:
tidyr::pivot_longer(
data = xx,
cols = everything(),
names_sep = '\\.',
names_to = c('term_type', 'term'),
values_to = 'term_val'
)
The result looks like this:
# A tibble: 8 x 3
term_type term term_val
<chr> <chr> <dbl>
1 Estimate Intercept -0.140
2 Est Error 0.775
3 Q5 Intercept -1.57
4 Q95 Intercept 0.773
5 Estimate Intercept -0.140
6 Est Error 0.777
7 Q5 Intercept -1.55
8 Q95 Intercept 0.792
But it throws this warning:
Warning message:
Expected 2 pieces. Additional pieces discarded in 1 rows [2].
Can I use the names_sep
term to specify that I want the second index of the split string, but only for the second column? i.e. I want Error
instead of Est
. I've fixed it for now using an ifelse
, but I'm wondering if it can be done within the call itself. Myy instinct is there's some clever regex, or perhaps something using stringr
, but I'm stumped for now...
CodePudding user response:
There are multiple .
in some of the column names (Est.Error.Intercept
). It may be better to use names_pattern
to capture groups ((...)
) that doesn't include any .
as characters ([^.]
). In addition, specify the end of string with $
tidyr::pivot_longer(
data = xx,
cols = everything(),
names_pattern = "([^.] )\\.([^.] )$",
names_to = c('term_type', 'term'),
values_to = 'term_val'
)
-output
# A tibble: 8 × 3
term_type term term_val
<chr> <chr> <dbl>
1 Estimate Intercept -0.1
2 Error Intercept 0.7
3 Q5 Intercept -1.5
4 Q95 Intercept 0.7
5 Estimate Intercept -0.2
6 Error Intercept 0.8
7 Q5 Intercept -1.4
8 Q95 Intercept 0.8
"([^.] )\\.([^.] )$"
- captures as two groups 1) ([^.] )
- one or more characters that are not a .
, followed by a .
(\\.
) and 2) the second set of characters that are not a .
till the end ($
) of the string.
CodePudding user response:
You may keep names_sep
using rename_with
first:
library(dplyr)
library(stringr)
xx %>%
rename_with(~str_replace(., '.Intercept', '_Intercept')) %>%
tidyr::pivot_longer(
cols = everything(),
names_sep = '\\_',
names_to = c('term_type', 'term'),
values_to = 'term_val'
)
term_type term term_val
<chr> <chr> <dbl>
1 Estimate Intercept -0.1
2 Est.Error Intercept 0.7
3 Q5 Intercept -1.5
4 Q95 Intercept 0.7
5 Estimate Intercept -0.2
6 Est.Error Intercept 0.8
7 Q5 Intercept -1.4
8 Q95 Intercept 0.8