Home > Enterprise >  Remove multiple file extesions when using gnu parallel and cat in bash
Remove multiple file extesions when using gnu parallel and cat in bash

Time:11-17

I have a csv file (separated by comma), which contains

file1a.extension.extension,file1b.extension.extension
file2a.extension.extension,file2b.extension.extension

Problem is, these files are name such as file.extension.extension

I'm trying to feed both columns to parallel and removing all extesions

I tried some variations of:

cat /home/filepairs.csv | sed 's/\..*//' | parallel --colsep ',' echo column 1 = {1}.extension.extension column 2 =  {2} 

Which I expected to output

column 1 = file1a.extension.extension column 2 = file1b
column 1 = file2a.extension.extension column 2 = file2b

But outputs:

column 1 = file1a.extension.extension column 2 = 
column 1 = file2a.extension.extension column 2 =

The sed command is working but is feeding only column 1 to parallel

CodePudding user response:

As currently written the sed only prints one name per line:

$ sed 's/\..*//'  filepairs.csv
file1a
file2a

Where:

  • \. matches on first literal period (.)
  • .* matches rest of line (ie, everything after the first literal period to the end of the line)
  • // says to remove everything from the first literal period to the end of the line

I'm guessing what you really want is two names per line ... one sed idea:

$ sed 's/\.[^,]*//g'   filepairs.csv
file1a,file1b
file2a,filepath2b

Where:

  • \. matches on first literal period (.)
  • [^,]* matches on everything up to a comma (or end of line)
  • //g says to remove the literal period, everything afterwards (up to a comma or end of line), and the g says to do it repeatedly (in this case the replacement occurs twice)

NOTE: I don't have parallel on my system so unable to test that portion of OP's code

  • Related