Home > Net >  Linux / bash write to new file if csv value is satisfied
Linux / bash write to new file if csv value is satisfied

Time:11-06

I have a CSV file containing the following:

Name,Price,Date,SKU

I the date is formatted yyy-mm-dd and I want to take all items in the file from 2020-01 and put them in their own file in a different directory.

The results file will not exist at first but I need to run this on 3 csv files so it will exist for the last 2 runs. I want to maintain the old data still, rather just add all the data with the year 2020 into a CSV file.

what I have now:

awk -F, '$3 ~ /2020-01/ {print}' sourceFile.csv > data/2020/01/destonationFile.csv

I have tried to play around with grep and awk, but do not know how to properly check if the date is correct. Any advice helps.

sample data as source - sourceData.csv

Name,Price,Date,SKU
pixel 5, 20000, 2020-01-04, 124124
iphone 8, 35000,2019-12-11, 124125
note 20 , 50000, 2020-04-16, 124127
note 20 ultra, 60000, 2020-01-12, 124128
s 8, 15000, 2017, 124129

Sample data as output - destonationFile.csv

pixel 5, 20000, 2020-01-04, 124124
note 20 ultra, 60000, 2020-01-12, 124126

CodePudding user response:

Try this:

grep '2020-01' sourceData.csv > destonationFile.csv

sourceData.csv content:

Name,Price,Date,SKU
pixel 5, 20000, 2020-01-04, 124124
iphone 8, 35000,2019-12-11, 124125
note 20 , 50000, 2020-04-16, 124127
note 20 ultra, 60000, 2020-01-12, 124128
s 8, 15000, 2017, 124129

destonationFile.csv content:

pixel 5, 20000, 2020-01-04, 124124
note 20 ultra, 60000, 2020-01-12, 124128

CodePudding user response:

A few thoughts. (1) you were really close and you are thinking along the correct lines. (2) your data file format contains both ',' and ", " as delimiters in places that will cause problems (e.g. "35000,2019-12-11" and "20000, 2020-01-04"). (3) To consistently identify "2020-01..." at the beginning of the field, you want to anchor your REGEX to the beginning with '^'. Additionally, print is the default action taken by awk, so if you have no additional rules to apply, you can simply omit it.

Putting that together and checking for the first record in the file FNR==1 to also grab the heading for your destination file, you can do:

awk -F",|, " 'FNR==1 || $3~/^2020-01/' sourceData.csv

Where -F",|, " allows you to use alternate field-separators to handle the ',' and ", " separators.

Since you are outputting the entire record that is matching your selection criteria, you can be assured that the records are not modified. So long as the selection is working as you intent, your output file will hold the same content for the records selected from your input file.

Example Use/Output

With your data in sourceData.csv, the above would produce:

$ awk -F",|, " 'FNR==1 || $3~/^2020-01/' sourceData.csv
Name,Price,Date,SKU
pixel 5, 20000, 2020-01-04, 124124
note 20 ultra, 60000, 2020-01-12, 124128

If you don't want the heading in the destination file, simply remove the "FNR==1 || " part

Let me know if you have questions.

  • Related