I am struggling to read properly a .txt. For that, I am using:
r_dat <- read_csv(ff,quote = ",", skip=5)
And the data looks like:
dput(head(r_dat))
structure(list(`--------------------------------------------------------` =
c("240 19790111_00 7 7 0.86587346 0.75074303 1.35784054
-0.45948577 -1.18579698 -1.07059395 -0.34373909",
"243 19790111_03 0 7 0.85441613 0.72267860 1.31580353 -0.44945070
-1.16703987 -1.03862977 -0.32952571",
"246 19790111_06 7 7 0.83927369 0.69352823 1.27102554 -0.43822104
-1.14390016 -1.00381625 -0.31613618",
"249 19790111_09 0 7 0.82096398 0.66378951 1.22433603 -0.42610571
-1.11709785 -0.96681082 -0.30378893",
"252 19790111_12 7 7 0.79906243 0.63312817 1.17505550
-0.41290590 -1.08599937 -0.92708772 -0.29231820",
"255 19790111_15 0 7 0.77413946 0.60201460 1.12398231 -0.39892274
-1.05132735 -0.88528168 -0.28191039"
)), row.names = c(NA, -6L), class = c("tbl_df", "tbl", "data.frame"
))
So, it results a single a character column that would contain, time, hour, and some numeric values, each one should go into a different column.
In order to properly handle the data, I need to split the column into 11 columns. So, after reading the file, I tried to use separate the column:
r_dat%>%separate(1, into=paste("X",seq(1,11,by=1), sep=""), sep=" ")
But it is not working, and it seems that I need to add more columns. Also, when adding more than 11 columns and using the whole data dataset, I got warnings "Warning messages: 1: Expected 27 pieces. Additional pieces discarded in 57140 rows " ..
I tried several "quote" patterns (e.g. " ", "\t",), as I thought that the problem was there..but it is not working. Any suggestion will be really appreciated.
Thanks
CodePudding user response:
Looking at your data it seems you want to split into 12 elements, assuming that you want to split everywhere where one or more whitespaces occur:
r_dat %>%
separate(1, into = paste0("X", 1:12), sep = " {1,}")
or
r_dat %>%
mutate(new_cols = str_split(.[[1]], " {1,}")) %>%
unnest_wider(new_cols)
The first alternative would give:
# A tibble: 6 x 12
X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 240 19790111_00 7 7 0.86587346 0.75074303 1.35784054 "\n" "-0.45948577" -1.18579698 -1.07059395 -0.34373909
2 243 19790111_03 0 7 0.85441613 0.72267860 1.31580353 "-0.44945070" "\n" -1.16703987 -1.03862977 -0.32952571
3 246 19790111_06 7 7 0.83927369 0.69352823 1.27102554 "-0.43822104" "\n" -1.14390016 -1.00381625 -0.31613618
4 249 19790111_09 0 7 0.82096398 0.66378951 1.22433603 "-0.42610571" "\n" -1.11709785 -0.96681082 -0.30378893
5 252 19790111_12 7 7 0.79906243 0.63312817 1.17505550 "\n" "-0.41290590" -1.08599937 -0.92708772 -0.29231820
6 255 19790111_15 0 7 0.77413946 0.60201460 1.12398231 "-0.39892274" "\n" -1.05132735 -0.88528168 -0.28191039
Getting to your second point of converting to numeric, you could try:
r_dat %>%
separate(1, into = paste0("X", 1:12), sep = " {1,}") %>%
mutate(across(everything(), ~type.convert(.)))
which gives:
# A tibble: 6 x 12
X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12
<int> <chr> <int> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 240 19790111_00 7 7 0.866 0.751 1.36 NA -0.459 -1.19 -1.07 -0.344
2 243 19790111_03 0 7 0.854 0.723 1.32 -0.449 NA -1.17 -1.04 -0.330
3 246 19790111_06 7 7 0.839 0.694 1.27 -0.438 NA -1.14 -1.00 -0.316
4 249 19790111_09 0 7 0.821 0.664 1.22 -0.426 NA -1.12 -0.967 -0.304
5 252 19790111_12 7 7 0.799 0.633 1.18 NA -0.413 -1.09 -0.927 -0.292
6 255 19790111_15 0 7 0.774 0.602 1.12 -0.399 NA -1.05 -0.885 -0.282