Home > database >  Renaming a .tsv file with bash if a .json file with the same name contains a certain string
Renaming a .tsv file with bash if a .json file with the same name contains a certain string

Time:07-26

For each subject I have a folder with two files (.json and .tsv) per task (gram, plaus, and sem), for a total of 6 files per subject. Each pair of .tsv/.json files have the same name besides the file extension. For example, one subject's folder might contain: xxx.tsv, xxx.json, yyy.tsv, yyy.json, zzz.tsv, zzz.json.

I want to look through each .json file, see whether it contains the string "Gram", "Plaus", or "Sem", and rename the corresponding .tsv file to contain _Gram, _Plaus, or _Sem before the file extension based on which is found. Right now, my code (after changing to my subject folder) looks like this:

find -type f -name "*_regressors.json" -print0 | while IFS= read -r -d '' filename
do
    if [[grep -q 'Sem' "$filename"]]; then
        sem_name="${filename%.*}" 
    mv ${sem_name}.tsv ${sem_name}_sem.tsv
    fi 
    
    if [[grep -q 'Plaus' "$filename"]]; then
    plaus_name="${filename%.*}"
    mv ${plaus_name}.tsv ${plaus_name}_plaus.tsv
    fi
    
    if [[grep -q 'Gram' "$filename"]]; then
        gram_name="${filename%.*}"
    mv ${gram_name}.tsv ${gram_name}_gram.tsv
    fi
done

I'm wondering if an awk command might work better? I'm new to scripting with bash and unix in general, so any ideas are much appreciated!

CodePudding user response:

It does make sense to use awk instead of grep in this case:

#!/bin/bash

find . -type f -name "*_regressors.json" -print0 |
while IFS= read -r -d '' filename
do
    suffix=$(
        awk '
            match($0,/Sem|Plaus|Gram/) {
                print tolower(substr($0,RSTART,RLENGTH))
                exit
            }
        ' "$filename"
    )
    mv "$filename" "${filename%.*}_$suffix.tsv" 
done

but trying to match a literal string inside a JSON file without parsing it might yield unexpected results

CodePudding user response:

Would you please try the following:

#!/bin/bash

find . -type f -name "*_regressors.json" -print0 | while IFS= read -r -d '' f; do
    str=$(grep -oE "\b(Sem|Plaus|Gram)\b" "$f")                 # search the json file for the strings
    if (( $? == 0 )); then                                      # $? returns 0 if grep matches
        str=$(head -n 1 <<< "$str" | tr [:upper:] [:lower:])    # pick the 1st match and lower the case
        base=${f%.json}                                         # remove the extention
        echo mv -- "${base}.tsv" "${base}_${str}.tsv"           # rename the file
    fi
done
  • The head command picks the 1st matched line just in case there are multiple matches. (It may be overthinking.)
  • If the printed commands look good, drop echo before mv and run.
  • Related