Home > Enterprise >  How to extract a specific year from a column of timestamps in a csv file
How to extract a specific year from a column of timestamps in a csv file

Time:10-19

I have a large CSV file with many columns, and multiple columns that have timestamps. I want to filter the data for a specific year based on only 1 of those columns.

Example of what some of my input CSV looks like: (there are no headers)

17263847 11/20/2018 3:00:13 PM 11/23/2018 6:45:00 AM Approved
19483742 12/22/2019 4:00:12 PM 1/10/2020 4:50:11 AM Approved
38274938 10/10/2018 2:02:19 PM 02/07/2019 1:04:15 PM Approved

I want to extract all the rows that have 2019 in the second column; so for the example here, I would want to extract the 2nd row but not the 3rd row. Then, I want all of those rows to be put into a new CSV file.

Is there a simple way to do this using grep in command line? I used this but it's not working:

awk -F, '$1=="2019"' file1.csv > file2.csv

Any help would be appreciated!

CodePudding user response:

  • First of all, in awk the second column is not $1, but $2 (remember that $0 refers to the whole line/register.

  • Second: Instead of the == literal comparison, use the regex matching ~ (first tutorial I found).

The command you need is:

awk -F, -e '$2 ~ /2019/' file1.csv > file2.csv
  • Related