Using R, I want to pick last row data only from over 2000 csv in the same directory and make single dataframe.
Directory = "C:\data
File name, for example '123456_p' (6 number digit)
Each csv has different number of rows, but has the same number of columns (10 columns)
I know the tail and list function, but over 2000 dataframes, inputting manually is time wasting.
Is there any way to do this with loop through R?
As always, I really appreciate your help and support
CodePudding user response:
There are four things you need to do here:
- Get all the filenames we want to read in
- Read each in and get the last row
- Loop through them
- Bind them all together
There are many options for each of these steps, but let's use purrr for the looping and binding, and base-R for the rest.
- Get all the filenames we want to read in
You can do this with the list.files()
function.
filelist = list.files(pattern = '.csv')
will generate a vector of filenames for all CSV files in the working directory. Edit as appropriate to specify the pattern further or target a different directory.
- Read each in and get the last row
The read.csv()
function can read in each file (if you want it to go faster, use data.table::fread()
instead), and as you mentioned tail()
can get the last row. If you build a function out of this it will be easier to loop over, or change the process if it turns out you need another step of cleaning.
read_one_file = function(x) {
tail(read.csv(x), 1)
}
- Loop through them
- Bind them all together
You can do both of these steps at once with map_df()
in the purrr package.
library(purrr)
final_data = map_df(filelist, read_one_file)