Home > Software design >  Split Row by Condition in R
Split Row by Condition in R

Time:01-10

I have a dataframe with coefficients and standard errors (in parenthesis) of a regression model that looks like this:

data =  data.frame(Variable = c('x1', 'x2', 'x3'), 
           Coefficients_se1 = c('0.04 (0.03)', '', ''), 
           Coefficients_se2 = c('', '0.08*** (0.01)', ''), 
           Coefficients_se3 = c('', '', '0.02* (0.01)'))

print(data)

  Variable Coefficients_se1 Coefficients_se2 Coefficients_se3
1       x1      0.04 (0.03)                                  
2       x2                    0.08*** (0.01)                 
3       x3                                       0.02* (0.01)

I would like to know if there is a way to move the values in the parenthesis to the line above in the same dataset. This way, I would like to produce an output like above:

  Variable Coefficients_se1 Coefficients_se2 Coefficients_se3
1       x1             0.04                                  
2                    (0.03)                                  
3       x2                           0.08***                 
4                                     (0.01)                 
5       x3                                              0.02*
6                                                      (0.01)

Is it possible to do this using R ?

CodePudding user response:

I can't think of an 'easy' way to achieve your expected output, but you can manipulate your data using tidyverse functions, e.g.

library(tidyverse)

data = data.frame(Variable = c('x1', 'x2', 'x3'), 
                   Coefficients_se1 = c('0.04 (0.03)', '', ''), 
                   Coefficients_se2 = c('', '0.08*** (0.01)', ''), 
                   Coefficients_se3 = c('', '', '0.02* (0.01)'))

data %>%
  pivot_longer(-Variable) %>%
  filter(value != "") %>%
  separate(col = value, into = c("coef", "stderr"),
           sep = " ") %>%
  pivot_longer(-c(name, Variable),
               names_to = "stat") %>%
  pivot_wider(names_from = name,
              values_from = value)
#> # A tibble: 6 × 5
#>   Variable stat   Coefficients_se1 Coefficients_se2 Coefficients_se3
#>   <chr>    <chr>  <chr>            <chr>            <chr>           
#> 1 x1       coef   0.04             <NA>             <NA>            
#> 2 x1       stderr (0.03)           <NA>             <NA>            
#> 3 x2       coef   <NA>             0.08***          <NA>            
#> 4 x2       stderr <NA>             (0.01)           <NA>            
#> 5 x3       coef   <NA>             <NA>             0.02*           
#> 6 x3       stderr <NA>             <NA>             (0.01)

Created on 2023-01-10 with reprex v2.0.2

  • Related