Home > Net >  How to get all the line by date from text file in linux
How to get all the line by date from text file in linux

Time:08-08

I have a text file That contains almost 23 million lines(4 fields each line)

Format of the one line:-

2014-06-01|2024-07-30|3515034013|50008
  • 1st field is the Activation date.
  • 2nd field is Expiry date.
  • 3rd field is code
  • 4th field is ID

and I want to get all the lines,

(01). Expiry Before 2022-08-06. 
(02).2 years validity lines in between 2022-08-06 and 2024-08-06.(Expiry 2 years from 2022-08-06.). 

Can I use sed/awk for this? I am confused about how to use that and can anyone please help or give an idea to reach that?

CodePudding user response:

Not sure how to do it by sed or awk but you can use something similar like:

I've created test file data.txt

2014-06-01|2021-07-30|3515034013|50008
2014-06-01|2024-07-30|3515034013|50008

And this script.sh:

#!/bin/bash
input="./data.txt"
expCondition="2022-08-06"

while IFS= read -r line # For each line
do
  arrIN=(${line//|/ }) # Split it by |
  actDate="${arrIN[0]}"
  expDate="${arrIN[1]}"

  if [[ "$expDate" < "$expCondition" ]];
  then
    echo "$line"
  fi
done < "$input"

Basically this can be done. It is only for first check if expire data is less than. I hope you will change "if" condition for second case :)

CodePudding user response:

Can I use sed/awk for this? I am confused about how to use that and can anyone please help or give an idea to reach that?

You can use awk as your dates are zero-padded big-endian, so you can get expected result of comparing them as string, consider following simple example, let file.txt content be

2014-06-01
2024-07-30

then

awk '{print $1, $1<"2022-08-06", "2022-08-06"<=$1&&$1<="2024-08-06"}' file.txt

gives output

2014-06-01 1 0
2024-07-30 0 1

Explanation: I compare date as string, which gives expected result as format allowing such comparison was used. This is not specific to awk, but can be used in any language supporting lexicographical comparison or lexicographical sort. Be warned that this will fail for other format dates e.g. U.S. middle-endian.

(tested in gawk 4.2.1)

  • Related