Home > Enterprise >  Filtering large data file by date using command line
Filtering large data file by date using command line

Time:10-07

I have a csv file that contains a bunch of data with one of the columns being date. I am trying to extract all lines that have dates in a specific year and save it into a new file.

The format of file is like this with the date and time in the second column:

000000000,10/04/2021 02:10:15 AM,.....

So far I tried:

grep -E ^2020 data.csv >> temp.csv

But it just produced an empty temp list. Any ideas on how I can do this?

CodePudding user response:

One potential solution is with awk:

awk -F"," '$2 ~ /\/2020 /' data.csv > temp.csv

Another potential option is with grep:

grep "\/2020 " data.csv > temp.csv

However, the grep solution may detect "/2020 " elsewhere in the file, rather than in column 2.

CodePudding user response:

Although awk solution is best here, e.g.

awk -F, 'index($2, "/2021 ")' file

grep can also be used here:

grep  '^[^,]*,[^,]*/2021 ' file

See the online demo

Notes:

  • awk -F, 'index($2, "/2021 ")' splits the lines (records) into fields with a comma (see -F,), and if there is a /2021 space in the second field ($2) the line is printed
  • the ^[^,]*,[^,]*/2021 pattern in the grep command matches
    • ^ - start of string
    • [^,]* - zero or more non-comma chars
    • ,[^,]* - a , and zero or more non-comma chars
    • /2021 - a literal substring.
  • Related