Home > Software design >  rename column as file name - multiple csv files R
rename column as file name - multiple csv files R

Time:10-16

I have 200 csv files with names "a.csv", "b.csv", "c.csv" etc..

In each csv file, there are two columns: "type" and "abundance". I'd like to change the name of the abundance column in each csv file to "a_abundance","b_abundance" etc. to match the file name, and then save the csv file with the new column names.

So far I have the following, but it doesn't work.

filenames<- list.files(pattern = ".csv")

all_files <- lapply (filenames, function (x) {
  file <- read.csv (x) 
  name= sub(".*", "", x) 
  
  colnames(file) <- paste (colnames(file), name, sep ='_') 
  
 return(file)
})

CodePudding user response:

Something like this:

all_files <- lapply(setNames(nm=filenames), function(fn) {
  dat <- read.csv(fn)
  ind <- colnames(dat) == "abundance"
  if (any(ind)) {
    colnames(dat)[ind] <- paste0(tools::file_path_sans_ext(basename(fn)), "_abundance")
  }
  dat
})

The above will read the data and change the one column name. (You said just one column, but your code is changing all columns ... I'll stick with just the one named "abundance".)

From here, you can rewrite with one from:

Map(write.csv, all_files, names(all_files))
## or ##
for (nm in names(all_files)) write.csv(all_files[[nm]], nm)

FYI, this could be done a lot faster on the command-line (bash shell or similar, as long as sed is available) with something like:

for fn in $(ls *.csv) ; do
  BN=$(basename "$fn" .csv)
  sed -i -E "1{s/abundance/${BN}_abundance/}" "$fn"
done

Walk-through:

  • For BN, the basename removes any leading directory component, and the trailing .csv removes that extension from the filename; this should translate ./a.csv to a.
  • For sed:
    • -i make the modification in-place on the file; note, this does not store a backup of the original file; if you use instead -i.bak then it will back up the file before modifying it, perhaps safer the first time you try this, then you can remove the *.bak files
    • -E is an extended-expression thing; you should be able to get by with -e as well, it's just habit for me
    • 1 means to only apply this rule on the first line of the file
    • s/from/to/ translates text from the from pattern to the to pattern, in this case prepending ${BN}_ (braces are a little defensive in bash envvar usage)
  •  Tags:  
  • r csv
  • Related