Home > Blockchain >  Regex - to remove date, time and file extension
Regex - to remove date, time and file extension

Time:04-30

Introduction

Hi! I've scanning bunch of domain for it's subdomain. The result; is a json file along with time & date with _ separator.

521wei.xin_2022_04_03_13_09_20.json             wechat.design_2022_04_02_15_13_13.json                                     wechat.org_2022_04_01_15_47_58.json                       admin.wechat.com_2022_04_01_15_37_35.json       wechatapp.com_2022_04_01_16_38_38.json
api.weixin.qq.com_2022_04_01_15_55_38.json      wechatlegal.cn_2022_04_01_16_20_20.json

The intention is to extract the domain names and decimate the rest of it (Date, Time, and File Extension)

Problematic

I've trying to use sed regex to identify the timestamp.

''/_[0-9]{4}_[0-9]{2}_[0-9]{2}_[0-9]{2}_[0-9]{2}_[0-9]{2}.json/d''

But it does nothing. Any help would be very appreciated

CodePudding user response:

Can Achieve similar result from the first answer with cut command:

cut -d '_' -f '1' listWechat.txt

CodePudding user response:

You can you the following sed command to strip out the timestamps and extension from the filename:

sed -e 's/[0-9_]\ .json/ /gm'

enter image description here

Sample usage:
echo "521wei.xin_2022_04_03_13_09_20.json \
wechat.design_2022_04_02_15_13_13.json \
wechat.org_2022_04_01_15_47_58.json \
admin.wechat.com_2022_04_01_15_37_35.json \
wechatapp.com_2022_04_01_16_38_38.json \
api.weixin.qq.com_2022_04_01_15_55_38.json \
wechatlegal.cn_2022_04_01_16_20_20.json" | sed -e 's/[0-9_]\ .json/ /gm'
Output
521wei.xin  wechat.design  wechat.org  admin.wechat.com  wechatapp.com  api.weixin.qq.com  wechatlegal.cn

Regex Demo

  • Related