$ grep "^底线$" query_20220922 | wc -l
95701
$ grep -iF "底线" query_20220922 | wc -l
796591
Shouldn't the count be exactly the same? I want to count the exact match of the string.
CodePudding user response:
-F
matches a fixed string anywhere in a line. ^xyz$
matches lines which contain "xyz" exactly (nothing else).
You are looking for -x
/--line-regexp
and not -F
/--fixed-strings
.
To match lines which contain your search text exactly, without anything else and without interpreting your search text as regular expression, combine the two flags: grep -xF 'findme' file.txt
.
Also, case-insensitive matching (-i
) can match more lines too than case-sensitive matching (the default).
CodePudding user response:
No, they do different things. The first uses a regular expression to search for "底线" alone on an input line (^
in a regular expression means beginning of line, and $
means end of line).
The second searches for the string anywhere on an input line. The -i
flag does nothing at all here (it selects case-insensitive matching, but this is not well-defined for CJK character sets, so basically a no-op) and -F
says to search literally (which makes the search faster for internal reasons, but doesn't change the semantics of a search string which doesn't contain any regex metacharacters).
It should be easy to see how they differ. For a large input file, it might be a bit challenging to find the differences if they are not conveniently mixed; but for a quick start, try
diff -u <(grep -m5 "^底线$" query_20220922) <(grep -m5Fi "底线" query_20220922)
where -m5
picks out the first five matches. (Try a different range, perhaps with tail
, if the differences are all near the end of the file, for example.)
Tangentially, you usually want to replace the pipe to wc -l
with grep -c
; also,you might want to try grep -Fx "底线"
as a faster alternative to the first search.