I have a file that contains multiples informations, i'm trying to delete all the content of the file except a specific pattern that i want as a list.
The pattern is like this:
/[csc]-[ALPHANUM of 4]-[ALPHANUM]-[ALPHANUM of 3]-[NUM of 8]/
The pattern always have csc
at the start but can be in the middle of the lines.
It has /
athe the start of the pattern and at the end sometimes (if this could help) but i just need what's between.
Example :
/csc-dbc1-repo01x-x11-20210101/
i tried to do something like this :
grep 'csc-[a-z0-9]{4}-[a-z0-9]-[0-9]{3}-[0-9]{8}' file1
But it returns nothing. Is there anyway i can have these patterns listed in the file.
Expected result :
csc-dbc1-repo01x-x11-20210301
csc-dvc1-rmco01x-x12-20220104
csc-cbc1-revehq1-A11-20210101
Extract of lines from the file containing a pattern :
"assets" : [ {
"downloadUrl" : "https://URL/repository/doc/v&/PROJECT/SUBP/csc-dbc1-repo01x-x11-20210301/DIR/DIR2",
"path" : "v&/PROJECT/SUBP/csc-dbc1-repo01x-x11-20210301/DIR/DIR2",
CodePudding user response:
Using sed
$ sed -En 's/.*(csc-[[:alnum:]-] ).*/\1/p' input_file
csc-dbc1-repo01x-x11-20210301
csc-dbc1-repo01x-x11-20210301
CodePudding user response:
With GNU awk
you could try following code, written and tested with shown samples only. Simple explanation would be, using regex (\\/|\n|[[:space:]])csc(-[[:alnum:]] )
to get expected output as per OP's requirement and setting it into RS
variable of awk
. In main program substituting extra part and printing only required part out of it.
awk -v RS='(\\/|\n|[[:space:]])csc(-[[:alnum:]] ) ' '
RT{
sub(/^\/|[[:space:]]|\n/,"",RT)
print RT
}
' Input_file
CodePudding user response:
You can use
grep -Eo -i 'csc-[a-z0-9]{4}-[a-z0-9] -[a-z0-9]{3}-[0-9]{8}'
See the online demo:
#!/bin/bash
s='"assets" : [ {
"downloadUrl" : "https://URL/repository/doc/v&/PROJECT/SUBP/csc-dbc1-repo01x-x11-20210301/DIR/DIR2",
"path" : "v&/PROJECT/SUBP/csc-dbc1-repo01x-x11-20210301/DIR/DIR2",'
grep -Eo -i 'csc-[a-z0-9]{4}-[a-z0-9] -[a-z0-9]{3}-[0-9]{8}' <<< "$s"
Output:
csc-dbc1-repo01x-x11-20210301
csc-dbc1-repo01x-x11-20210301
Note the use of -o
option that extracts the matched substrings, and E
that enables the POSIX ERE regex syntax. i
makes matching case insensitive.
Details:
csc-
- a fixed string[a-z0-9]{4}
- four alphanumerics-
- a hyphen[a-z0-9]
- one or more alphanumeric chars-
- a hyphen[a-z0-9]{3}
- three alphanumerics-
- a hyphen[0-9]{8}
- eight digits.
If you do not need that matching precision, you can even shorten/simplify the regex to
grep -o 'csc-[-[:alnum:]]*'
See this online demo.
CodePudding user response:
csc-[a-zA-Z0-9]{4}-[a-zA-Z0-9] -[a-zA-Z0-9]{3}-[0-9]{8}
This one matches the examples you gave. Or you could search for case insensitive as Wiktor Stribiżew did so you can avoid adding more patterns to the regexp
csc-[a-z0-9]{4}-[a-z0-9] -[a-z0-9]{3}-[0-9]{8}