Home > OS >  Why do these two grep commands produce different results?
Why do these two grep commands produce different results?

Time:09-26

$ grep "^底线$" query_20220922  | wc -l
95701
$ grep -iF "底线" query_20220922  | wc -l
796591

Shouldn't the count be exactly the same? I want to count the exact match of the string.

CodePudding user response:

-F matches a fixed string anywhere in a line. ^xyz$ matches lines which contain "xyz" exactly (nothing else).

You are looking for -x/--line-regexp and not -F/--fixed-strings.

To match lines which contain your search text exactly, without anything else and without interpreting your search text as regular expression, combine the two flags: grep -xF 'findme' file.txt.

Also, case-insensitive matching (-i) can match more lines too than case-sensitive matching (the default).

CodePudding user response:

No, they do different things. The first uses a regular expression to search for "底线" alone on an input line (^ in a regular expression means beginning of line, and $ means end of line).

The second searches for the string anywhere on an input line. The -i flag does nothing at all here (it selects case-insensitive matching, but this is not well-defined for CJK character sets, so basically a no-op) and -F says to search literally (which makes the search faster for internal reasons, but doesn't change the semantics of a search string which doesn't contain any regex metacharacters).

It should be easy to see how they differ. For a large input file, it might be a bit challenging to find the differences if they are not conveniently mixed; but for a quick start, try

diff -u <(grep -m5 "^底线$" query_20220922) <(grep -m5Fi "底线" query_20220922)

where -m5 picks out the first five matches. (Try a different range, perhaps with tail, if the differences are all near the end of the file, for example.)

Tangentially, you usually want to replace the pipe to wc -l with grep -c; also,you might want to try grep -Fx "底线" as a faster alternative to the first search.

  • Related