I have a CSV file containing the following:
Name,Price,Date,SKU
I the date is formatted yyy-mm-dd
and I want to take all items in the file from 2020-01
and put them in their own file in a different directory.
The results file will not exist at first but I need to run this on 3 csv files so it will exist for the last 2 runs. I want to maintain the old data still, rather just add all the data with the year 2020 into a CSV file.
what I have now:
awk -F, '$3 ~ /2020-01/ {print}' sourceFile.csv > data/2020/01/destonationFile.csv
I have tried to play around with grep and awk
, but do not know how to properly check if the date is correct.
Any advice helps.
sample data as source - sourceData.csv
Name,Price,Date,SKU
pixel 5, 20000, 2020-01-04, 124124
iphone 8, 35000,2019-12-11, 124125
note 20 , 50000, 2020-04-16, 124127
note 20 ultra, 60000, 2020-01-12, 124128
s 8, 15000, 2017, 124129
Sample data as output - destonationFile.csv
pixel 5, 20000, 2020-01-04, 124124
note 20 ultra, 60000, 2020-01-12, 124126
CodePudding user response:
Try this:
grep '2020-01' sourceData.csv > destonationFile.csv
sourceData.csv content:
Name,Price,Date,SKU
pixel 5, 20000, 2020-01-04, 124124
iphone 8, 35000,2019-12-11, 124125
note 20 , 50000, 2020-04-16, 124127
note 20 ultra, 60000, 2020-01-12, 124128
s 8, 15000, 2017, 124129
destonationFile.csv content:
pixel 5, 20000, 2020-01-04, 124124
note 20 ultra, 60000, 2020-01-12, 124128
CodePudding user response:
A few thoughts. (1) you were really close and you are thinking along the correct lines. (2) your data file format contains both ','
and ", "
as delimiters in places that will cause problems (e.g. "35000,2019-12-11"
and "20000, 2020-01-04"
). (3) To consistently identify "2020-01..."
at the beginning of the field, you want to anchor your REGEX to the beginning with '^'
. Additionally, print
is the default action taken by awk
, so if you have no additional rules to apply, you can simply omit it.
Putting that together and checking for the first record in the file FNR==1
to also grab the heading for your destination file, you can do:
awk -F",|, " 'FNR==1 || $3~/^2020-01/' sourceData.csv
Where -F",|, "
allows you to use alternate field-separators to handle the ','
and ", "
separators.
Since you are outputting the entire record that is matching your selection criteria, you can be assured that the records are not modified. So long as the selection is working as you intent, your output file will hold the same content for the records selected from your input file.
Example Use/Output
With your data in sourceData.csv
, the above would produce:
$ awk -F",|, " 'FNR==1 || $3~/^2020-01/' sourceData.csv
Name,Price,Date,SKU
pixel 5, 20000, 2020-01-04, 124124
note 20 ultra, 60000, 2020-01-12, 124128
If you don't want the heading in the destination file, simply remove the "FNR==1 || "
part
Let me know if you have questions.