Lets say I have a file named duplicates.txt
which appears as the following:
ID-32532
ID-78313
ID-89315
I also have a directory Fastq
of files with the following names:
ID-18389_Feb92003_R1.fastq
ID-18389_Feb92003_R2.fastq
ID-32532_Feb142003_R1.fastq
ID-32532_Feb142003_R2.fastq
ID-48247_Mar202004_R1.fastq
ID-48247_Mar202004_R2.fastq
I want to enter a command that will search duplicates.txt
and find any file whose name is a partial match in the Fastq
directory and remove the file. Based on the provided example this would remove the files named ID-32532_Feb142003_{R1/R2}.fastq
.
What Unix command should I use or if need be I could write a script in Python.
CodePudding user response:
In unix, just replace the variable character with a '?' or '.*'.
duplicates.txt
remove ID-?????
Fastq
remove ID-?????_????????_??.fastq
remove ID-.*fastq
CodePudding user response:
Here's a little bash function to do it:
lrmduplicates(){
while read -r dupe;
do
echo removing "$dupe" ;
#fine tune with ls first...
#ls Fastq/$dupe*
rm Fastq/$dupe*
# dupes file: dont forget a line feed after 3rd pattern
# i.e. end on empty line.
done < duplicates.txt
}
For extra bonus, suppress error when no match. Not sure how to do that myself. rm -f
or rm 2>/dev/null
didnt do it (zsh on macos).