How to split a single character column into multiple column while converting to numeric?-CodePudding

I am struggling to read properly a .txt. For that, I am using:

      r_dat <- read_csv(ff,quote = ",", skip=5)

And the data looks like:

      dput(head(r_dat))
      structure(list(`--------------------------------------------------------` = 
      c("240 19790111_00    7    7   0.86587346   0.75074303   1.35784054  
      -0.45948577  -1.18579698  -1.07059395  -0.34373909", 
      "243 19790111_03    0    7   0.85441613   0.72267860   1.31580353  -0.44945070  
      -1.16703987  -1.03862977  -0.32952571", 
       "246 19790111_06    7    7   0.83927369   0.69352823   1.27102554  -0.43822104  
       -1.14390016  -1.00381625  -0.31613618", 
       "249 19790111_09    0    7   0.82096398   0.66378951   1.22433603  -0.42610571  
      -1.11709785  -0.96681082  -0.30378893", 
        "252 19790111_12    7    7   0.79906243   0.63312817   1.17505550  
      -0.41290590  -1.08599937  -0.92708772  -0.29231820", 
      "255 19790111_15    0    7   0.77413946   0.60201460   1.12398231  -0.39892274  
      -1.05132735  -0.88528168  -0.28191039"
      )), row.names = c(NA, -6L), class = c("tbl_df", "tbl", "data.frame"
       ))

So, it results a single a character column that would contain, time, hour, and some numeric values, each one should go into a different column.

In order to properly handle the data, I need to split the column into 11 columns. So, after reading the file, I tried to use separate the column:

     r_dat%>%separate(1, into=paste("X",seq(1,11,by=1), sep=""), sep=" ")

But it is not working, and it seems that I need to add more columns. Also, when adding more than 11 columns and using the whole data dataset, I got warnings "Warning messages: 1: Expected 27 pieces. Additional pieces discarded in 57140 rows " ..

I tried several "quote" patterns (e.g. " ", "\t",), as I thought that the problem was there..but it is not working. Any suggestion will be really appreciated.

Thanks

CodePudding user response：

Looking at your data it seems you want to split into 12 elements, assuming that you want to split everywhere where one or more whitespaces occur:

r_dat %>%
  separate(1, into = paste0("X", 1:12), sep = " {1,}")

r_dat %>%
  mutate(new_cols = str_split(.[[1]], " {1,}")) %>%
  unnest_wider(new_cols)

The first alternative would give:

# A tibble: 6 x 12
  X1    X2          X3    X4    X5         X6         X7         X8            X9            X10         X11         X12        
  <chr> <chr>       <chr> <chr> <chr>      <chr>      <chr>      <chr>         <chr>         <chr>       <chr>       <chr>      
1 240   19790111_00 7     7     0.86587346 0.75074303 1.35784054 "\n"          "-0.45948577" -1.18579698 -1.07059395 -0.34373909
2 243   19790111_03 0     7     0.85441613 0.72267860 1.31580353 "-0.44945070" "\n"          -1.16703987 -1.03862977 -0.32952571
3 246   19790111_06 7     7     0.83927369 0.69352823 1.27102554 "-0.43822104" "\n"          -1.14390016 -1.00381625 -0.31613618
4 249   19790111_09 0     7     0.82096398 0.66378951 1.22433603 "-0.42610571" "\n"          -1.11709785 -0.96681082 -0.30378893
5 252   19790111_12 7     7     0.79906243 0.63312817 1.17505550 "\n"          "-0.41290590" -1.08599937 -0.92708772 -0.29231820
6 255   19790111_15 0     7     0.77413946 0.60201460 1.12398231 "-0.39892274" "\n"          -1.05132735 -0.88528168 -0.28191039

Getting to your second point of converting to numeric, you could try:

r_dat %>%
  separate(1, into = paste0("X", 1:12), sep = " {1,}") %>%
  mutate(across(everything(), ~type.convert(.)))

which gives:

# A tibble: 6 x 12
     X1 X2             X3    X4    X5    X6    X7     X8     X9   X10    X11    X12
  <int> <chr>       <int> <int> <dbl> <dbl> <dbl>  <dbl>  <dbl> <dbl>  <dbl>  <dbl>
1   240 19790111_00     7     7 0.866 0.751  1.36 NA     -0.459 -1.19 -1.07  -0.344
2   243 19790111_03     0     7 0.854 0.723  1.32 -0.449 NA     -1.17 -1.04  -0.330
3   246 19790111_06     7     7 0.839 0.694  1.27 -0.438 NA     -1.14 -1.00  -0.316
4   249 19790111_09     0     7 0.821 0.664  1.22 -0.426 NA     -1.12 -0.967 -0.304
5   252 19790111_12     7     7 0.799 0.633  1.18 NA     -0.413 -1.09 -0.927 -0.292
6   255 19790111_15     0     7 0.774 0.602  1.12 -0.399 NA     -1.05 -0.885 -0.282