Is it possible using a Linux command or shell script to remove unwanted rows in a file based on cert-CodePudding

I have a file that has ~12300000 rows of type <timestamp, reading> -

1674587549.228 29214
1674587549.226 29384
1674587549.226 27813
1674587549.226 28403
1674587549.228 28445
...
1674587948.998 121
1674587948.998 126
1674587948.999 119
1674587949.000 126
1674587948.996 156
1674587948.997 152
1674587948.998 156
1674588149.225 316
1674588149.226 310
1674588149.223 150
1674588149.224 152
1674588149.225 150
1674588149.225 144
...
1674588149.225 227
1674588149.226 233
1674588149.226 275

The last - first timestamp equals 600. I want to create a new file that starts with row last - nth timestamp till the end.

For example, if n=200, the new file should start with 1674588149.226-200 i.e. from 1674587949.000 126 to 1674588149.226 275.

Can this be done using a linux command / shell script? If so how it can be done? Thanks.

CodePudding user response：

If I understood correctly, you are trying to create files which have a constant and equal number of lines in each, starting from the last one.

If so, this script will perform the task.

If you only want one file, then you can remove the logic associated with the looping and index value iterations.

Note: The name of each file corresponds to the first part of the last line in each file (i.e. last entry of record).

This example does splitting for groupings of 5 lines. You can replace the 5 by 100 or 200, as you see fit.

#!/bin/bash

input="testdata.txt"
cat >"${input}" <<"EnDoFiNpUt"
1674587948.998 121
1674587948.998 126
1674587948.999 119
1674587948.996 156
1674587948.997 152
1674587948.998 156
1674587949.000 126
1674588149.225 316
1674588149.226 310
1674588149.223 150
1674588149.224 152
1674588149.225 150
1674588149.225 144
1674588149.225 227
1674588149.226 233
1674588149.226 275
EnDoFiNpUt

awk -v slice="5" 'BEGIN{
    split("", data) ;
    dataIDX=0 ;
}
{
    dataIDX   ;
    data[dataIDX]=$0 ;
}
END{
    #print dataIDX ;

    slLAST=dataIDX ;
    #print slLAST ;

    slFIRST=slLAST-slice 1 ;
    if( slFIRST <= 0 ){
        slFIRST=1 ;
    } ;
    #print slFIRST ;

    k=0 ;
    while( slLAST > 0 ){
        k  ;
        split(data[slLAST], datline, " " ) ;
        fname=sprintf("%s__d.txt", datline[1], k ) ;
        printf("\t New file: %s\n", fname ) | "cat >&2" ; 

        for( i=slFIRST ; i<=slLAST ; i  ){
            print data[i] >fname ;
        } ;

        if( slFIRST == 1 ){
            exit ;
        } ;

        slLAST=slFIRST-1 ;
        slFIRST=slLAST-slice 1 ;
        if( slFIRST <= 0 ){
            slFIRST=1 ;
        } ;
    } ;
}' "${input}"

CodePudding user response：

I you only want the last 200 line entries of a log, then the absolute simplest is by using tail. Namely

tail -200 log.txt >${newLogName}

If you want to create multiple files of 200 lines each, you could use the sequence

tac log.txt | tail -n  201 | tac >log.remain
mv log.remain log.txt

in a loop that include assigning a unique name for each slice ${newLogName} slice.

OR, you could create a reverse log at the outset, and create the sublists working down the reverse list, but remembering to reverse each individual shortlist before saving those in their final form.