How to rename files using certain string inside each file?-CodePudding

I have a list of html files that contain a certain tag in each file as below:

<div id="myID" style="display:none">1_34876</div>

I would like to search for that tag in each file and rename each file according to the number within that tag, i.e rename the file containing the tag above to 1_34876.html and so forth..

Is there a regex or bash command using grep or awk that can accomplish this?

So far I was able to grep each file using the following command but stuck on how to rename the files:

grep '<div id="myID" style="display:none">.*</div>' ./*.html

An additional bonus would be if the command doesn't overwrite duplicate files, e.g. if another file contains the 1_34876 tag above then the second file would be renamed as 1_34876 v2.html or something similar.

Kindly advice if this can be achieved in a way that doesn't require programming.

Many thanks indeed. Ali

CodePudding user response：

You can use the following script to achieve your goal. Note, for the script to work on macOS, you either have to install GNU grep via Homebrew, or substitute the grep call with ggrep.

The script will search the current directory and all its subdirectories for *.html files.
It will substitute only the names of the files that contain the specific tag.
For multiple files that containt the same tag, each subsicuent file apart from the first will have an identifier appended to its name. E.g., 1_234.html, 1_234_1.html, 1_234_2.html
For files that contain multiple tags, the first tag encountered will be used.

#!/bin/bash

rename_file ()
{
    # Check that file name received is a non zero string
    file_name="$(realpath "${1}")"
    if [ -f "${file_name}" ]; then
        echo "No argument or non existing file or non regular file provided"
        exit 1
    fi

    # Get the tag number. If the number does not exist, the variable tag will be
    # empty. The first tag on a file will be used if there are multiple tags
    # within a file.
    tag="$(grep -oP -m 1 '(?<=<div id="myID" style="display:none">).*?(?=</div>)' \
        -- "${file_name}")"

    # Rename the file only if it contained a tag
    if [ ! -z "${tag}" ]; then
        file_path="$(dirname "${file_name}")"

        # Change directory to the file's location silently
        pushd "${file_path}" > /dev/null

        # Check for multiple occurences of files with the same tag
        if [ -e "${tag}.html" ]; then
            counter="$(find ./ -maxdepth 1 -type f -name "${tag}.html" -o -name "${tag}_*.html" | wc -l)"
            tag="${tag}_${counter}"
        fi

        # Rename the file
        mv "${file_name}" "${tag}.html"

        # Return to previous directory silently
        popd > /dev/null
    fi

}

# Necessary in order to call rename_file from find command within main
export -f rename_file

# The entry point function of the script. This function searches for all the
# html files in the directory that the script is run, and all subdirectories.
# The function calls rename_files upon each of the found files.
main ()
{
    find ./ -type f -name "*.html" -exec bash -c 'rename_file "${1}"' _ {} \;
}

main

CodePudding user response：

Solution :

# create test files
rm *.html
echo '<div id="myID" style="display:none">1_1</div>' > 1.html
echo '<div id="myID" style="display:none">1_2</div>' > 2.html

# loop throught matching files
while IFS= read -r l_line
do
    # get file name using exact match -o
    l_name=$( echo "$l_line" | egrep -o ".*html" )

    # get tag value using perl reg exp : values between >< , 
    # \K (?=) remove brackets so without them we will get simple reg exp ">.*<"
    # details https://unix.stackexchange.com/questions/13466/can-grep-output-only-specified-groupings-that-match
    l_new_name="$( echo "$l_line" | grep -oP ">\K.*(?=<)" ).html"

    # for echoing let use this variable
    l_cmd="mv $l_name $l_new_name"

    # rename if file does not exist
    if [[ -f $l_new_name ]]; then
        echo "file $l_new_name already exists, no rename" >&2
    else
        echo "$l_cmd"
        # execute and inform if works
        eval $l_cmd || echo "ERROR " >&2
    fi
done <<< $(grep -o '<div id="myID" style="display:none">.*</div>' ./*.html)