I have a large CSV file with many columns, and multiple columns that have timestamps. I want to filter the data for a specific year based on only 1 of those columns.
Example of what some of my input CSV looks like: (there are no headers)
17263847 | 11/20/2018 3:00:13 PM | 11/23/2018 6:45:00 AM | Approved |
19483742 | 12/22/2019 4:00:12 PM | 1/10/2020 4:50:11 AM | Approved |
38274938 | 10/10/2018 2:02:19 PM | 02/07/2019 1:04:15 PM | Approved |
I want to extract all the rows that have 2019 in the second column; so for the example here, I would want to extract the 2nd row but not the 3rd row. Then, I want all of those rows to be put into a new CSV file.
Is there a simple way to do this using grep in command line? I used this but it's not working:
awk -F, '$1=="2019"' file1.csv > file2.csv
Any help would be appreciated!
CodePudding user response:
First of all, in awk the second column is not
$1
, but$2
(remember that$0
refers to the whole line/register.Second: Instead of the
==
literal comparison, use the regex matching~
(first tutorial I found).
The command you need is:
awk -F, -e '$2 ~ /2019/' file1.csv > file2.csv