I have 20 files with different types of information for a list of cities.
For all the files, I want to grep the information of a given city, so, if for example I have three cities (NewYork, Barcelona, Rome), I would like to generate three files (NewYork.txt, Barcelona.txt, Rome.txt), with all the information in the 20 files for those cities.
I can do this easily with the script:
#!/bin/bash
list=("NewYork", "Barcelona", "Rome")
for i in "${list[@]}"
do
echo $i
zgrep -Hx $i *.vcf.gz > $i.txt
done
echo "Done"
However, there are two dificulties:
- First, the list of cities is huge, so I need the script to read a txt file with the list of cities for which I want the data, instead of creating the list manually inside the script.
- Secondly, the files with the information of the cities is in the folder
C:/Users/Roy/DataReceived
, and I want to store the output.txt
inC:/Users/Roy/Documents/Results
.
This is script I wrote:
#!/bin/bash
for FILE in C:/Users/Roy/DataReceived/*; do
while read i; do
zgrep -Hwi $i *.vcf.gz > $i.txt
done < list_of_cities.txt
done
This script would be stored in C:/Users/Roy/Documents/Results
, so there's where the files would be created.
However, I'm getting the error 'gzip: *.vcf.gz: No such file or directory'. It's not recognizing the file, so I guess there's something wrong with the path.
CodePudding user response:
I think your script is wrong:
#!/bin/bash
for FILE in C:/Users/Roy/DataReceived/*; do
while read i; do
zgrep -Hwi $i *.vcf.gz > $i.txt
done < list_of_cities.txt
done
- Windows path:
C:/Users
is not valid in bash, and you would have to use/c/Users
. This may work, but probably not as you intend. - You are trying to read all files in
DataReceived
and you don't use the FILE you read. - Your
zgrep
is not working as you intend because*.vcf.gz
is evaluated against the current directory ($PWD
), and in your example bash find nothing because there is nothing (yup). - Your cities may contains space in their name:
New York
and so on, but your don't escape the word itself.
I think you should rewrite your script like this:
while read -r city; do
zgrep -Hwi "$city" *.vcf.gz > "${city}.txt"
done < list_of_cities.txt
The quote ensure at least that space in the name does not cause problem, even if there may be problematic characters for filenames (and in Windows, this would be : /
, \
, :
, ?
, "
, *
, <
, >
and |
).
The script will look for all *.vcf.gz
file in the current directory.
[edit] if you need to change input/output directory, you can do so:
while read -r city; do
zgrep -Hwi "$city" "${INPUT_DIRECTORY}/"*.vcf.gz > "${OUTPUT_DIRECTORY}/${city}.txt"
done < list_of_cities.txt
Either your hardcode the path before the while
read loop:
declare INPUT_DIRECTORY=/c/Users/Roy/DataReceived
declare OUTPUT_DIRECTORY=/c/Users/Roy/Documents/Results
Or if you want to change them before invocation:
[[ -z "${INPUT_DIRECTORY}" ]] && INPUT_DIRECTORY="$PWD"
[[ -z "${OUTPUT_DIRECTORY}" ]] && OUTPUT_DIRECTORY="$PWD"
Then you can invoke your script like this:
INPUT_DIRECTORY=/c/Users/Roy/DataReceived OUTPUT_DIRECTORY=/c/Users/Roy/Documents/Results ./yourscript.bash
Or:
export INPUT_DIRECTORY=/c/Users/Roy/DataReceived
export OUTPUT_DIRECTORY=/c/Users/Roy/Documents/Results
./yourscript.bash