Home > Net >  bash/awk/unix detect changes in lines of csv files
bash/awk/unix detect changes in lines of csv files

Time:10-28

I have a timestamp in this format:

(normal_file.csv)

timestamp
19/02/2002
19/02/2002
19/02/2002
19/02/2002
19/02/2002
19/02/2002

The dates are usually uniform, however, there are files with irregular dates pattern such as this example:

(abnormal_file.csv)

timestamp
19/02/2002
19/02/2003
19/02/2005
19/02/2006

In my directory, there are hundreds of files that consist of normal.csv and abnormal.csv.

I want to write a bash or awk script that detect the dates pattern in all files of a directory. Files with abnormal.csv should be moved automatically to a new, separate directory (let's say dir_different/).

Currently, I have tried the following:

#!/bin/bash

mkdir dir_different

for FILE in *.csv;

do
  # pipe 1: detect the changes in the line
  # pipe 2: print the timestamp column (first column, columns are comma-separated)
  awk '$1 != prev {print ; prev = $1}' < $FILE | awk -F , '{print $1}'
done

If the timestamp in a given file is normal, then only one single timestamp will be printed; but for abnormal files, multiple dates will be printed.

I am not sure how to separate the abnormal files from the normal files, and I have tried the following:

do
   output=$(awk 'FNR==3{print $0}' $FILE)
   echo ${output}

   if [[ ${output} =~ ([[:space:]]) ]]
   then
      mv $FILE dir_different/
   fi
done

Or is there an easier method to detect changes in lines and separate files that have different lines? Thank you for any suggestions :)

CodePudding user response:

Assuming that none of your "normal" CSV files have trailing newlines this should do the separation just fine:

#!/bin/bash
mkdir -p dir_different

for FILE in *.csv;
do
        if awk '{a[$1]  }END{if(length(a)<=2){exit 1}}' "$FILE" ; then
                echo mv "$FILE" dir_different
        fi
done

After a dry-run just get rid of the echo :)

CodePudding user response:

So, a "normal" file contains only two different lines:

timestamp
dd/mm/yyyy

Testing if a file is normal is thus as simple as:

[ $(sort -u file.csv | wc -l) -eq 2 ]

This leads to the following possible solution:

#!/usr/bin/env bash
mkdir -p dir_different

for FILE in *.csv;
do
        if [ $(sort -u "$FILE" | wc -l) -ne 2 ] ; then
                echo mv "$FILE" dir_different
        fi
done
  • Related