I am new to R and this is a bit difficult for me.
I need to open multiple txt (all have the same number of variables and variables names, see picture[[1]); to each, I need to add multiple columns with specific parts of their file name. Finally, I need to save a CSV resulting from merging all txts (with the added coloums corresponding to each txt unique file name).
The file name looks like NVR_ch2_main_20220505140000_20220505150000 and I would need three coloumn, Month (05, the first of the two after 2022), Day (05) and hour (14).
Any help?
I got stacked quite early, just managed the process with the whole file name for one file...(I am very sorry, I am truly new to R)
Thanks so much to anyone who could help me with this
read_tsv('NVR_ch2_main_20220505132105_20220505140000.txt') %>%
mutate(filename = 'NVR_ch2_main_20220505132105_20220505140000.txt') %>%
select(filename, everything ()) %>%
write_csv('C:/Users/marta/Documents/R/Crete/NVR_ch2_main_20220505132105_20220505140000.csv')
CodePudding user response:
Try this. I can't promise it will work since I don't have your data, but this should find all files with the common prefix, extract the month day and hour from the file name, and then read all the data in as one combined file:
library(tidyverse)
tibble(file = list.files(pattern = "NVR.*\\.txt")) |>
extract(file,
into = c("month", "day", "hour"),
regex = ".*\\w_\\d{4}(\\d{2})(\\d{2})(\\d{2})\\d{4}_",
remove = FALSE,
convert = TRUE) |>
mutate(data = map(file, read_tsv)) |>
unnest(data)
CodePudding user response:
Sorry for the extra questions. I think I understand now. Does this do what you want?
library(tidyverse)
list_of_files <- list.files("C:/Users/marta/Documents/R/Crete", pattern="tsv", full.names=T)
fun1 <- function(x) {
df1 <- read_tsv(x)
newname <- gsub("tsv", "csv", x)
df1 <- df1 %>% mutate(month = str_sub(x, 18, 19), day = str_sub(x, 20, 21), hour = str_sub(x, 22, 23))
write.csv(del1, newname)
}
lapply(list_of_files, fun1)
You may have to adjust the numbers that are in the str_sub()
calls so that you get the right position from the file names.
Note the the columns will be probably be characters. If you need numbers for month
, day
, and hour
, you should change str_sub()
to as.numeric(str_sub())
.