Home > database >  Read a txt file to grep information over a list of files
Read a txt file to grep information over a list of files

Time:10-07

I have 20 files with different types of information for a list of cities.

For all the files, I want to grep the information of a given city, so, if for example I have three cities (NewYork, Barcelona, Rome), I would like to generate three files (NewYork.txt, Barcelona.txt, Rome.txt), with all the information in the 20 files for those cities.

I can do this easily with the script:

#!/bin/bash

list=("NewYork", "Barcelona", "Rome")

for i in "${list[@]}"
do
   echo $i
   zgrep -Hx $i *.vcf.gz > $i.txt
done
echo "Done"

However, there are two dificulties:

  • First, the list of cities is huge, so I need the script to read a txt file with the list of cities for which I want the data, instead of creating the list manually inside the script.
  • Secondly, the files with the information of the cities is in the folder C:/Users/Roy/DataReceived, and I want to store the output .txt in C:/Users/Roy/Documents/Results.

This is script I wrote:

#!/bin/bash

for FILE in C:/Users/Roy/DataReceived/*; do
   while read i; do
      zgrep -Hwi $i *.vcf.gz > $i.txt
   done < list_of_cities.txt
done

This script would be stored in C:/Users/Roy/Documents/Results, so there's where the files would be created.

However, I'm getting the error 'gzip: *.vcf.gz: No such file or directory'. It's not recognizing the file, so I guess there's something wrong with the path.

CodePudding user response:

I think your script is wrong:

#!/bin/bash
for FILE in C:/Users/Roy/DataReceived/*; do
   while read i; do
      zgrep -Hwi $i *.vcf.gz > $i.txt
   done < list_of_cities.txt
done
  1. Windows path: C:/Users is not valid in bash, and you would have to use /c/Users. This may work, but probably not as you intend.
  2. You are trying to read all files in DataReceived and you don't use the FILE you read.
  3. Your zgrep is not working as you intend because *.vcf.gz is evaluated against the current directory ($PWD), and in your example bash find nothing because there is nothing (yup).
  4. Your cities may contains space in their name: New York and so on, but your don't escape the word itself.

I think you should rewrite your script like this:

while read -r city; do
  zgrep -Hwi "$city" *.vcf.gz > "${city}.txt"
done < list_of_cities.txt

The quote ensure at least that space in the name does not cause problem, even if there may be problematic characters for filenames (and in Windows, this would be : /, \, :, ?, ", *, <, > and |).

The script will look for all *.vcf.gz file in the current directory.

[edit] if you need to change input/output directory, you can do so:

while read -r city; do
  zgrep -Hwi "$city" "${INPUT_DIRECTORY}/"*.vcf.gz > "${OUTPUT_DIRECTORY}/${city}.txt"
done < list_of_cities.txt

Either your hardcode the path before the while read loop:

declare INPUT_DIRECTORY=/c/Users/Roy/DataReceived
declare OUTPUT_DIRECTORY=/c/Users/Roy/Documents/Results

Or if you want to change them before invocation:

[[ -z "${INPUT_DIRECTORY}" ]] && INPUT_DIRECTORY="$PWD"
[[ -z "${OUTPUT_DIRECTORY}" ]] && OUTPUT_DIRECTORY="$PWD"

Then you can invoke your script like this:

INPUT_DIRECTORY=/c/Users/Roy/DataReceived OUTPUT_DIRECTORY=/c/Users/Roy/Documents/Results ./yourscript.bash

Or:

export INPUT_DIRECTORY=/c/Users/Roy/DataReceived
export OUTPUT_DIRECTORY=/c/Users/Roy/Documents/Results
./yourscript.bash
  •  Tags:  
  • bash
  • Related