Home > Software design >  How to read CSV file content and find out between based on string value in shell script
How to read CSV file content and find out between based on string value in shell script

Time:10-14

I have 1 data file 'file1.csv' with two columns data and need to read file content and find out the between rows based on first column string values 'finance & total' and similarly 'marketing & total'. Script should find 'finance' and 'finance total' then get the between (A,B,D,H) records And script should find 'marketing' and 'marketing total' then get the between (C,E,G) records. I don't want to get records before 'Finance' and after 'Marketing Total' string.

Also need to skip lines where 2nd or 3rd fields have nan string.

I tried with below code but not getting expected output.

while read line
do
    if [[ $line == "finance" ]]; then
        echo $line >> output.csv
    else
        echo ""
    fi
done < file1.csv

file1.csv:
Departments Accounts  ##Header
Monthwise data        ##Sub Header
IT,Amount
P,20
q,30
IT Total,50
Finance,Amount
A,20
B,30
D,60
H,50
Finance Total,160
<Empty space>
Marketing,Amount
C,40
E,10
G,60
Marketing Total,110
HR,amount
X,20
Y,50
Z,10

Expected Output:

department,name,amount
Finance,A,20
Finance,B,30
Finance,D,60
Finance,H,50
Marketing,C,40
Marketing,E,10
Marketing,G,60  

Expected output should be stored in output file.

CodePudding user response:

With your shown samples, please try following awk program. Simple explanation would be, setting comma as FS, OFS for all lines and print heading in BEGIN section itself. Then in main program, check different conditions and print the values accordingly. NOTE this program will exit once it finds Marketing Total as per OP's mention.

awk '
BEGIN{
  FS=OFS=","
  print "department,name,amount"
}
/Marketing Total/                          { exit                  }
/Finance Total/ || ($2=="nan" || $3=="nan"){ next                  }
/Finance,Amount/ || /Marketing,Amount/     { found=1;first=$1;next }
found                                      { print first,$0        }
'  Input_file

OR in case nan value needs to be compared as case in-sensitive then try following code.

awk '
BEGIN{
  FS=OFS=","
  print "department,name,amount"
}
/Marketing Total/                                            { exit                  }
/Finance Total/ || (tolower($2)=="nan" || tolower($3)=="nan"){ next                  }
/Finance,Amount/ || /Marketing,Amount/                       { found=1;first=$1;next }
found                                                        { print first,$0        }
'  Input_file
  • Related