I need to get a list of unique client computer names/ip addresses that are accessing a server from the access logs of the server.
The target log line looks like this:
2020-11-17 15:34:04.208 -0500 Information 94 XYZ-ASDF-FMP123 Client "%USERNAME% (QWER-L1212-W6) [11.22.333.44]" opening database "databasename" as "username".
In this example, the string (QWER-L1212-W6) [11.22.333.44]
would be an example of a unique instance of a client computer/ip address.
So the result would be something like this:
(QWER-L1212-W6) [11.22.333.44]
(QWER-L1234-W7) [11.22.333.55]
etc...
I tried this without success:
grep --only-matching '\(. \) \[. \]' | sort --unique Access.log
the matching fails and the entire log line is returned.
CodePudding user response:
Note you are using a POSIX BRE regex flavor since you do not pass -E
/-r
nor -P
options to change the regex flavor from the default one. \(...\)
defines a capturing group in POSIX BRE. There are more issues here though.
You need to use
grep -o '([^()]*) \[[^][]*]' Access.log | sort -u
Note the location of the input file argument to grep
.
The ([^()]*) \[[^][]*]
here is a POSIX BRE pattern that matches
(
- a literal(
char (a\(
is the start of a capturing group)[^()]*
- zero or more chars other than(
and)
)
- a literal)
char (a\)
is the end of a capturing group)\[
- a[
char[^][]*
- zero or more chars other than[
and]
]
- a]
char.
See the online demo:
#!/bin/bash
s='2020-11-17 15:34:04.208 -0500 Information 94 XYZ-ASDF-FMP123 Client "%USERNAME% (QWER-L1212-W6) [11.22.333.44]" opening database "databasename" as "username".'
grep -o '([^()]*) \[[^][]*]' <<< "$s" | sort -u
# => (QWER-L1212-W6) [11.22.333.44]
CodePudding user response:
grep --only-matching '\(. \) \[. \]' file.log
This is failing because you are not using ERE (extended regex or -E
) in grep
and
is not escaped. So for your case following may work:
grep -E --only-matching '\(. \) \[. \]' file.log
However this regex is problematic because .
will match 1 of any character before matching closing )
and closing ]
. If you have (...) [...]
substring in your log like this:
2020-11-17 15:34:04.208 -0500 Information 94 XYZ-ASDF-FMP123 Client "%USERNAME% (QWER-L1212-W6) [11.22.333.44]" opening database "databasename" as "username".
2020-11-17 15:34:04.208 -0500 Information 94 XYZ-ASDF-FMP123 Client "%USERNAME% (QWER-L1212-W6) [21.22.333.33]" opening database "databasename" as "username" (QWER-L1234-W7) [11.22.333.55]
Then you will get incorrect results. Incorrect results will also show up with the pattern as '([^()]*) \[[^][]*]'
.
Since you are using access.log
where format and positions of fields are fixed it is much safer and efficient to use awk
for this extraction like this:
awk -F '"' '{sub(/^[^ ]* /, "", $2); print $2}' file.log
(QWER-L1212-W6) [11.22.333.44]
(QWER-L1212-W6) [21.22.333.33]