Get the part of the filename from list of files avoiding duplicates-CodePudding

I have file "main.log" containing the list of filenames:

admin-dev.log
admin-prod.log
type1-app-dev-1.24.2.log
type1-app-dev-latest.log
type1-app-prod-1.24.2.log
type1-app-prod-1.24.3.log
type1-app-prod-latest.log
type2-app-stage-1.24.2.log
type2-app-dev-1.38.6.log
type2-app-dev-latest.log
type2-app-prod-1.38.6.log
type2-app-prod-1.38.7.log
type2-app-prod-latest.log

How to extract filenames from "main.log" file and print-out only version numbers relative to (type1 or type2), without duplicate versions?

Expected result from example above:

type1: 1.24.2
type1: 1.24.3
type1: latest
type2: 1.38.6
type2: 1.38.7
type2: latest

This is what I manage to do till now..

file_list="main.log";

# Read each line from the file
while IFS= read -r line; do
    if [ "$line" != "$file_list" ]; then

        # filter type1
        ver1=$(echo "$line" | grep -o 'type1[^<]*' | sed -e 's/type1-app-\(.*\).apk/\1/');
        
        #filter type2
        ver2=$(echo "$line" | grep -o 'type2[^<]*' | sed -e 's/type2-app-\(.*\).apk/\1/');

        if [ "$ver1" ]; then
            echo "type1: $ver1";
        else
            echo "type2: $ver2";
        fi
    fi
done < $file_list

CodePudding user response：

This may be what you want:

sed -n 's/^\(type.\).*-\([^-]*\)\.log$/\1: \2/p' main.log | sort -u

CodePudding user response：

With your shown samples please try following awk sort code. Written and tested with GNU awk. Using match function of GNU awk which has capability of creating capturing groups to be used their values in array to be accessed later on in program.

Here is the Online demo for used regex.

awk '
match($0,/(^type[0-9] )-[^-]*-[^-]*-([0-9] (\.[0-9] ) )\.log$/,arr){
  print arr[1]": "arr[2]
}
' Input_file | sort -u

OR: In case you don't want to include string latest in output as per shown samples then try following code:

awk '
match($0,/(^type[0-9] )-[^-]*-[^-]*-(.*)\.log$/,arr){
  print arr[1]": "arr[2]
}
' Input_file | sort -u