Home > Software engineering >  Regex capture group works in regex101, not sed
Regex capture group works in regex101, not sed

Time:02-28

I found some other similarly titled questions but didn't find the answer.

My text is:

##bcftools_mergeCommand=merge --force-samples -m none -O v -o analysis/STUDY1/hg19/exome/merged.vcf --threads 4 analysis/STUDY1/hg19/exome/varscan_norm.vcf.gz analysis/STUDY1/hg19/exome/gatk_norm.vcf.gz analysis/STUDY1/hg19/exome/samtools_norm.vcf.gz analysis/STUDY1/hg19/exome/freebayes_norm.vcf.gz

I want the names of the .vcf.gz files.

Sed gives me:

echo "##bcftools_mergeCommand=merge --force-samples -m none -O v -o analysis/STUDY1/hg19/exome/merged.vcf --threads 4 analysis/STUDY1/hg19/exome/varscan_norm.vcf.gz analysis/STUDY1/hg19/exome/gatk_norm.vcf.gz analysis/STUDY1/hg19/exome/samtools_norm.vcf.gz analysis/STUDY1/hg19/exome/freebayes_norm.vcf.gz" | sed -En 's/\/([^\/] \.vcf\.gz)/\1/g'

with blank results.

Regex101 gives:

enter image description here

https://regex101.com/r/h3OGvN/1

CodePudding user response:

Why not using grep ?

$ data='##bcftools_mergeCommand=merge --force-samples -m none -O v -o analysis/STUDY1/hg19/exome/merged.vcf --threads 4 analysis/STUDY1/hg19/exome/varscan_norm.vcf.gz analysis/STUDY1/hg19/exome/gatk_norm.vcf.gz analysis/STUDY1/hg19/exome/samtools_norm.vcf.gz analysis/STUDY1/hg19/exome/freebayes_norm.vcf.gz'
$ echo $data | grep -Eo [^\/] \.vcf\.gz
varscan_norm.vcf.gz
gatk_norm.vcf.gz
samtools_norm.vcf.gz
freebayes_norm.vcf.gz

  • -E: Interpret patterns as extended regular expressions.
  • -o: Print only the matched (non-empty) parts.

CodePudding user response:

The regex dialect supported by Regex101 is different from the one sed understands.

Concretely, (take out the superflous g flag and) add a p flag to print the matching lines to fix this specific script; but in the general case, don't rely on a tool which doesn't directly support the one you actually want to use.

  • Related