Home > Enterprise >  How to split a single character column into multiple column while converting to numeric?
How to split a single character column into multiple column while converting to numeric?

Time:11-24

I am struggling to read properly a .txt. For that, I am using:

      r_dat <- read_csv(ff,quote = ",", skip=5)

And the data looks like:

      dput(head(r_dat))
      structure(list(`--------------------------------------------------------` = 
      c("240 19790111_00    7    7   0.86587346   0.75074303   1.35784054  
      -0.45948577  -1.18579698  -1.07059395  -0.34373909", 
      "243 19790111_03    0    7   0.85441613   0.72267860   1.31580353  -0.44945070  
      -1.16703987  -1.03862977  -0.32952571", 
       "246 19790111_06    7    7   0.83927369   0.69352823   1.27102554  -0.43822104  
       -1.14390016  -1.00381625  -0.31613618", 
       "249 19790111_09    0    7   0.82096398   0.66378951   1.22433603  -0.42610571  
      -1.11709785  -0.96681082  -0.30378893", 
        "252 19790111_12    7    7   0.79906243   0.63312817   1.17505550  
      -0.41290590  -1.08599937  -0.92708772  -0.29231820", 
      "255 19790111_15    0    7   0.77413946   0.60201460   1.12398231  -0.39892274  
      -1.05132735  -0.88528168  -0.28191039"
      )), row.names = c(NA, -6L), class = c("tbl_df", "tbl", "data.frame"
       ))

      

So, it results a single a character column that would contain, time, hour, and some numeric values, each one should go into a different column.

In order to properly handle the data, I need to split the column into 11 columns. So, after reading the file, I tried to use separate the column:

     r_dat%>%separate(1, into=paste("X",seq(1,11,by=1), sep=""), sep=" ")

But it is not working, and it seems that I need to add more columns. Also, when adding more than 11 columns and using the whole data dataset, I got warnings "Warning messages: 1: Expected 27 pieces. Additional pieces discarded in 57140 rows " ..

I tried several "quote" patterns (e.g. " ", "\t",), as I thought that the problem was there..but it is not working. Any suggestion will be really appreciated.

Thanks

CodePudding user response:

Looking at your data it seems you want to split into 12 elements, assuming that you want to split everywhere where one or more whitespaces occur:

r_dat %>%
  separate(1, into = paste0("X", 1:12), sep = " {1,}")

or

r_dat %>%
  mutate(new_cols = str_split(.[[1]], " {1,}")) %>%
  unnest_wider(new_cols)

The first alternative would give:

# A tibble: 6 x 12
  X1    X2          X3    X4    X5         X6         X7         X8            X9            X10         X11         X12        
  <chr> <chr>       <chr> <chr> <chr>      <chr>      <chr>      <chr>         <chr>         <chr>       <chr>       <chr>      
1 240   19790111_00 7     7     0.86587346 0.75074303 1.35784054 "\n"          "-0.45948577" -1.18579698 -1.07059395 -0.34373909
2 243   19790111_03 0     7     0.85441613 0.72267860 1.31580353 "-0.44945070" "\n"          -1.16703987 -1.03862977 -0.32952571
3 246   19790111_06 7     7     0.83927369 0.69352823 1.27102554 "-0.43822104" "\n"          -1.14390016 -1.00381625 -0.31613618
4 249   19790111_09 0     7     0.82096398 0.66378951 1.22433603 "-0.42610571" "\n"          -1.11709785 -0.96681082 -0.30378893
5 252   19790111_12 7     7     0.79906243 0.63312817 1.17505550 "\n"          "-0.41290590" -1.08599937 -0.92708772 -0.29231820
6 255   19790111_15 0     7     0.77413946 0.60201460 1.12398231 "-0.39892274" "\n"          -1.05132735 -0.88528168 -0.28191039

Getting to your second point of converting to numeric, you could try:

r_dat %>%
  separate(1, into = paste0("X", 1:12), sep = " {1,}") %>%
  mutate(across(everything(), ~type.convert(.)))

which gives:

# A tibble: 6 x 12
     X1 X2             X3    X4    X5    X6    X7     X8     X9   X10    X11    X12
  <int> <chr>       <int> <int> <dbl> <dbl> <dbl>  <dbl>  <dbl> <dbl>  <dbl>  <dbl>
1   240 19790111_00     7     7 0.866 0.751  1.36 NA     -0.459 -1.19 -1.07  -0.344
2   243 19790111_03     0     7 0.854 0.723  1.32 -0.449 NA     -1.17 -1.04  -0.330
3   246 19790111_06     7     7 0.839 0.694  1.27 -0.438 NA     -1.14 -1.00  -0.316
4   249 19790111_09     0     7 0.821 0.664  1.22 -0.426 NA     -1.12 -0.967 -0.304
5   252 19790111_12     7     7 0.799 0.633  1.18 NA     -0.413 -1.09 -0.927 -0.292
6   255 19790111_15     0     7 0.774 0.602  1.12 -0.399 NA     -1.05 -0.885 -0.282
  •  Tags:  
  • r
  • Related