Home > database >  linux sed grep -P replace string with newline and taking next line into consideration
linux sed grep -P replace string with newline and taking next line into consideration

Time:12-06

I have a file that was created and I need to replace the last "," with "" so it will be valid JSON. The problem is that I can't figure out how to do it with sed or even with grep/piping to something else. I am really stumped here. Any help would be appreciated.

test.json

[
{MANY OTHER RECORDS, MAKING FILE 3.5Gig (making sed fail because of memory, so newlines were added)},
{"ID":"57705e4a-158c-4d4e-9e07-94892acd98aa","USERNAME":"jmael","LOGINTIMESTAMP":"2021-11-30"},
{"ID":"b8b67609-50ed-4cdc-bbb4-622c7e6a8cd2","USERNAME":"henrydo","LOGINTIMESTAMP":"2021-12-15"},
{"ID":"a44973d0-0ec1-4252-b9e6-2fd7566c6f7d","USERNAME":"null","LOGINTIMESTAMP":"2021-10-31"},
]

Of course, using grep with -P matches what I need to replace

grep -Pzo '"},\n]' test.json

CodePudding user response:

Using GNU sed

$ sed -Ez 's/([^]]*),/\1/' test.json
[
{MANY OTHER RECORDS, MAKING FILE 3.5Gig (making sed fail because of memory, so newlines were added)},
{"ID":"57705e4a-158c-4d4e-9e07-94892acd98aa","USERNAME":"jmael","LOGINTIMESTAMP":"2021-11-30"},
{"ID":"b8b67609-50ed-4cdc-bbb4-622c7e6a8cd2","USERNAME":"henrydo","LOGINTIMESTAMP":"2021-12-15"},
{"ID":"a44973d0-0ec1-4252-b9e6-2fd7566c6f7d","USERNAME":"null","LOGINTIMESTAMP":"2021-10-31"}
]

CodePudding user response:

Remove last comma in a file with GNU sed:

sed -zE 's/,([^,]*)$/\1/' file

Output to stdout:

[
{MANY OTHER RECORDS, MAKING FILE 3.5Gig (making sed fail because of memory, so newlines were added)},
{"ID":"57705e4a-158c-4d4e-9e07-94892acd98aa","USERNAME":"jmael","LOGINTIMESTAMP":"2021-11-30"},
{"ID":"b8b67609-50ed-4cdc-bbb4-622c7e6a8cd2","USERNAME":"henrydo","LOGINTIMESTAMP":"2021-12-15"},
{"ID":"a44973d0-0ec1-4252-b9e6-2fd7566c6f7d","USERNAME":"null","LOGINTIMESTAMP":"2021-10-31"}
]

See: man sed and The Stack Overflow Regular Expressions FAQ

CodePudding user response:

You can bufferize two lines and remove the comma when reaching the end of the file:

awk '
    NR > 2 { print line0 }
    {
        line0 = line1
        line1 = $0
    }
    END {
        sub(/,$/,"",line0)
        print line0
        print line1
    }
'

example:

printf '%s,\n' 1 2 3 4 | awk ...
1,
2,
3
4,

An other solution would be to use perl to read the last n bytes of the file, then find the position of the target comma and replace it in-place with a space character:

perl -e '
    open $fh, " <", $ARGV[0];
    $n = 16;
    seek $fh, -$n, 2;
    $n = read $fh, $str, $n;
    if ( $str =~ /,\s*]\s*$/s ) {
        seek $fh, -($n - $-[0]), 1;
        print $fh " ";
    }
    close $fh;
' log.json

Aside: You should fix the code that generates the JSON upstream for at least making it output a stack of JSON objects instead of trying to build a broken array.

  • Related