I have a csv table that has a column that contains the date difference in days as integer values, and I want to split the file into 30 day intervals, such that the first file contains rows with the column values 0-30, the second one containing 31-60, and so on. How can I do that using the awk or split command?
Sample data:
ID, creation date, answer date, difference in days
1190, 2017-10-24 13:57:15.91 UTC, 2017-10-24 15:14:00.53 UTC, 0
610, 2017-10-24 13:57:15.91 UTC, 2017-10-24 14:58:17.77 UTC, 0
Whenever the last column (difference in days)
reaches 30 then I want to split the file with all the previous rows moved to the new file.
I tried this awk command but the results are not what I wanted:
awk -F, '{if($14 % 30 == 0) x="F" i;}{print > $14/30"months.csv"}' file.csv
CodePudding user response:
Firstly I suggest using integer division here rather than increase-when-divides-evenly, consider following sequence
1 2 3 4 7 9 10 10
and say you are dividing into size-5 interval, now if you increase-when-divides-evenly that would give
1 2 3 4 7 9 - 10 - 10
so two group were joined into one because no 5
is present and another was split into 2 groups due to repeated 10. Using integer division would give
1 2 3 4 - 7 9 - 10 10
Simple GNU AWK
implementation let file.txt
be
1,1
2,2
3,3
4,4
7,7
9,9
10,A
10,A
then
awk '{print > "file" int($1/5) ".txt"}' file.txt
creates file0.txt
holding
1,1
2,2
3,3
4,4
and file1.txt
holding
7,7
9,9
and file2.txt
holding
10,A
10,A
Explanation: int(x/y)
is integer division it does divide then takes floor.
(tested in GNU Awk 5.0.1)
CodePudding user response:
awk -F, 'BEGIN{ i=0; x="F" i }{print > "months_"x".csv"}{if($4 % 30 == 0 && $4>0) x="F" i;}' input.csv
BEGIN{ i=0; x="F" i }
initializei
andx
{print > "months_"x".csv"}
copy line to "months_{x}.csv",{if($4 % 30 == 0 && $4>0) x="F" i;}
Only after copying the line, check if the column (the 4th one) has reached 30, and changex
.
NOTE: The header will only be in the first output file ("months_F1.csv")
With this input:
ID, creation date, answer date, difference in days
1190, 2017-10-24 13:57:15.91 UTC, 2017-10-24 15:14:00.53 UTC, 0
610, 2017-10-24 13:57:15.91 UTC, 2017-10-24 14:58:17.77 UTC, 0
610, 2017-10-24 13:57:15.91 UTC, 2017-10-24 14:58:17.77 UTC, 30
1190, 2017-10-24 13:57:15.91 UTC, 2017-10-24 15:14:00.53 UTC, 0
610, 2017-10-24 13:57:15.91 UTC, 2017-10-24 14:58:17.77 UTC, 0
610, 2017-10-24 13:57:15.91 UTC, 2017-10-24 14:58:17.77 UTC, 30
the following files will be created: "months_F1.csv"
ID, creation date, answer date, difference in days
1190, 2017-10-24 13:57:15.91 UTC, 2017-10-24 15:14:00.53 UTC, 0
610, 2017-10-24 13:57:15.91 UTC, 2017-10-24 14:58:17.77 UTC, 0
610, 2017-10-24 13:57:15.91 UTC, 2017-10-24 14:58:17.77 UTC, 30
and "months_F2.csv"
1190, 2017-10-24 13:57:15.91 UTC, 2017-10-24 15:14:00.53 UTC, 0
610, 2017-10-24 13:57:15.91 UTC, 2017-10-24 14:58:17.77 UTC, 0
610, 2017-10-24 13:57:15.91 UTC, 2017-10-24 14:58:17.77 UTC, 30