I have a 4GB JSON file in which there are multiple Date fields. the Date format is 2021-10-15T06:02:50.455Z
. I want to replace this format with a simple date and time like this 2021-10-15T06:02:50
Is there any way I can do this with sed command
sed -e 's/[1-9][0-9]\{3\}-[0-9]\{2\}-[0-1][0-9]T[0-3][0-9]:[0-9]\{2\}:[0-9]\{2\}.[0-9]\{3\}Z/magic_here/g' test.json
I'm looking forward to Linus script but node or python also worked.
PS: regex is working fine
CodePudding user response:
This look like task where zero-length assertion might be useful, consider following example in python
:
import re
txt = "something 2021-10-15T06:02:50.455Z something"
clean = re.sub(r'(?<=\d\d\d\d-\d\d-\d\dT\d\d:\d\d:\d\d)\.\d Z','',txt)
print(clean)
output
something 2021-10-15T06:02:50 something
Explanation: Just remove \.\d Z
which are after pattern describing datetime part you want to preserve. Note that I used so called raw-string to make escaping easier, see re
module docs for further discussion. Note that .
needs to be espaced as it is denoting literal dot. \d
denotes any digit. This solution might be reworked to any regex tool supporting positive lookbehind assertion.
CodePudding user response:
a simple sed
would be
sed -Ei 's/(T[0-9]{2}:[0-9]{2}:[0-9]{2})[.][0-9]{3}Z/\1/g' file