Home > Enterprise >  delete content of a file except specific patterns
delete content of a file except specific patterns

Time:08-27

I have a file that contains multiples informations, i'm trying to delete all the content of the file except a specific pattern that i want as a list.

The pattern is like this:

/[csc]-[ALPHANUM of 4]-[ALPHANUM]-[ALPHANUM of 3]-[NUM of 8]/

The pattern always have csc at the start but can be in the middle of the lines. It has / athe the start of the pattern and at the end sometimes (if this could help) but i just need what's between.

Example :

/csc-dbc1-repo01x-x11-20210101/

i tried to do something like this :

grep 'csc-[a-z0-9]{4}-[a-z0-9]-[0-9]{3}-[0-9]{8}' file1 

But it returns nothing. Is there anyway i can have these patterns listed in the file.

Expected result :

csc-dbc1-repo01x-x11-20210301
csc-dvc1-rmco01x-x12-20220104
csc-cbc1-revehq1-A11-20210101

Extract of lines from the file containing a pattern :

"assets" : [ {
  "downloadUrl" : "https://URL/repository/doc/v&/PROJECT/SUBP/csc-dbc1-repo01x-x11-20210301/DIR/DIR2",
  "path" : "v&/PROJECT/SUBP/csc-dbc1-repo01x-x11-20210301/DIR/DIR2",
  

CodePudding user response:

Using sed

$ sed -En 's/.*(csc-[[:alnum:]-] ).*/\1/p' input_file
csc-dbc1-repo01x-x11-20210301
csc-dbc1-repo01x-x11-20210301

CodePudding user response:

With GNU awk you could try following code, written and tested with shown samples only. Simple explanation would be, using regex (\\/|\n|[[:space:]])csc(-[[:alnum:]] ) to get expected output as per OP's requirement and setting it into RS variable of awk. In main program substituting extra part and printing only required part out of it.

awk -v RS='(\\/|\n|[[:space:]])csc(-[[:alnum:]] ) ' '
RT{
  sub(/^\/|[[:space:]]|\n/,"",RT)
  print RT
}
'  Input_file

CodePudding user response:

You can use

grep -Eo -i 'csc-[a-z0-9]{4}-[a-z0-9] -[a-z0-9]{3}-[0-9]{8}'

See the online demo:

#!/bin/bash
s='"assets" : [ {
  "downloadUrl" : "https://URL/repository/doc/v&/PROJECT/SUBP/csc-dbc1-repo01x-x11-20210301/DIR/DIR2",
  "path" : "v&/PROJECT/SUBP/csc-dbc1-repo01x-x11-20210301/DIR/DIR2",'
grep -Eo -i 'csc-[a-z0-9]{4}-[a-z0-9] -[a-z0-9]{3}-[0-9]{8}' <<< "$s"

Output:

csc-dbc1-repo01x-x11-20210301
csc-dbc1-repo01x-x11-20210301

Note the use of -o option that extracts the matched substrings, and E that enables the POSIX ERE regex syntax. i makes matching case insensitive.

Details:

  • csc- - a fixed string
  • [a-z0-9]{4} - four alphanumerics
  • - - a hyphen
  • [a-z0-9] - one or more alphanumeric chars
  • - - a hyphen
  • [a-z0-9]{3} - three alphanumerics
  • - - a hyphen
  • [0-9]{8} - eight digits.

If you do not need that matching precision, you can even shorten/simplify the regex to

grep -o 'csc-[-[:alnum:]]*'

See this online demo.

CodePudding user response:

csc-[a-zA-Z0-9]{4}-[a-zA-Z0-9] -[a-zA-Z0-9]{3}-[0-9]{8}

This one matches the examples you gave. Or you could search for case insensitive as Wiktor Stribiżew did so you can avoid adding more patterns to the regexp

csc-[a-z0-9]{4}-[a-z0-9] -[a-z0-9]{3}-[0-9]{8}
  • Related