I am compiling C code with Clang. (Apple clang version 12.0.5 (clang-1205.0.22.11)
).
Clang can give tips in case you misspell a variable:
#include <iostream>
int main() {
int my_int;
std::cout << my_it << std::endl;
}
spellcheck-test.cpp:5:18: error: use of undeclared identifier 'my_it'; did you mean 'my_int'?
std::cout << my_it << std::endl;
^~~~~
my_int
spellcheck-test.cpp:4:9: note: 'my_int' declared here
int my_int;
^
1 error generated.
My question is:
What is the criterion Clang uses to determine when to suggest another variable?
My experimentation suggests it is quite sophisticated:
- If there is another similarly named variable that you might have meant (e.g.
int my_in;
) it does not give a suggestion - If the suggested variable has the wrong type for the operation (e.g. by trying to print
my_it.size()
instead) it does not give a suggestion - Whether or not it gives the suggestion depends on a non-trivial comparison of variable names: it allows for both deletions and insertions of characters, and longer variable names allow for more insertion/deletions to be considered "similar".
CodePudding user response:
You will not likely find it documented, but as Clang is open-source you can turn to the source to try to figure it out.
Clangd?
The particular diagnostic (from DiagnosticSemaKinds.td
):
def err_undeclared_var_use_suggest : Error<
"use of undeclared identifier %0; did you mean %1?">;
is ever only referred to from clang-tools-extra/clangd/IncludeFixer.cpp
:
// Try to fix unresolved name caused by missing declaration.
// E.g.
// clang::SourceManager SM;
// ~~~~~~~~~~~~~
// UnresolvedName
// or
// namespace clang { SourceManager SM; }
// ~~~~~~~~~~~~~
// UnresolvedName
// We only attempt to recover a diagnostic if it has the same location as
// the last seen unresolved name.
if (DiagLevel >= DiagnosticsEngine::Error &&
LastUnresolvedName->Loc == Info.getLocation())
return fixUnresolvedName();
Now, clangd is a language server and t.b.h. I don't know how whether this is actually used by the Clang compiler frontend to yield certain diagnostics, but you're free to continue down the rabbit hole to tie together these details. The fixUnresolvedName
above eventually performs a fuzzy search:
if (llvm::Optional<const SymbolSlab *> Syms = fuzzyFindCached(Req))
return fixesForSymbols(**Syms);
If you want to dig into the details, I would recommend starting with the fuzzyFindCached
function:
llvm::Optional<const SymbolSlab *>
IncludeFixer::fuzzyFindCached(const FuzzyFindRequest &Req) const {
auto ReqStr = llvm::formatv("{0}", toJSON(Req)).str();
auto I = FuzzyFindCache.find(ReqStr);
if (I != FuzzyFindCache.end())
return &I->second;
if (IndexRequestCount >= IndexRequestLimit)
return llvm::None;
IndexRequestCount ;
SymbolSlab::Builder Matches;
Index.fuzzyFind(Req, [&](const Symbol &Sym) {
if (Sym.Name != Req.Query)
return;
if (!Sym.IncludeHeaders.empty())
Matches.insert(Sym);
});
auto Syms = std::move(Matches).build();
auto E = FuzzyFindCache.try_emplace(ReqStr, std::move(Syms));
return &E.first->second;
}
along with the type of its single function parameter, FuzzyFindRequest
in clang/index/Index.h
:
struct FuzzyFindRequest {
/// A query string for the fuzzy find. This is matched against symbols'
/// un-qualified identifiers and should not contain qualifiers like "::".
std::string Query;
/// If this is non-empty, symbols must be in at least one of the scopes
/// (e.g. namespaces) excluding nested scopes. For example, if a scope "xyz::"
/// is provided, the matched symbols must be defined in namespace xyz but not
/// namespace xyz::abc.
///
/// The global scope is "", a top level scope is "foo::", etc.
std::vector<std::string> Scopes;
/// If set to true, allow symbols from any scope. Scopes explicitly listed
/// above will be ranked higher.
bool AnyScope = false;
/// The number of top candidates to return. The index may choose to
/// return more than this, e.g. if it doesn't know which candidates are best.
llvm::Optional<uint32_t> Limit;
/// If set to true, only symbols for completion support will be considered.
bool RestrictForCodeCompletion = false;
/// Contextually relevant files (e.g. the file we're code-completing in).
/// Paths should be absolute.
std::vector<std::string> ProximityPaths;
/// Preferred types of symbols. These are raw representation of `OpaqueType`.
std::vector<std::string> PreferredTypes;
bool operator==(const FuzzyFindRequest &Req) const {
return std::tie(Query, Scopes, Limit, RestrictForCodeCompletion,
ProximityPaths, PreferredTypes) ==
std::tie(Req.Query, Req.Scopes, Req.Limit,
Req.RestrictForCodeCompletion, Req.ProximityPaths,
Req.PreferredTypes);
}
bool operator!=(const FuzzyFindRequest &Req) const { return !(*this == Req); }
};
Other rabbit holes?
The following commit may be another leg to start from:
This can be used to append alternative typo corrections to an existing diag. include-fixer can use it to suggest includes to be added.
Differential Revision: https://reviews.llvm.org/D26745
from which we may end up in clang/include/clang/Sema/TypoCorrection.h
, which sounds like a more reasonably used feature by the compiler frontend than that of the (clang extra tool) clangd. E.g.:
/// Gets the "edit distance" of the typo correction from the typo.
/// If Normalized is true, scale the distance down by the CharDistanceWeight
/// to return the edit distance in terms of single-character edits.
unsigned getEditDistance(bool Normalized = true) const {
if (CharDistance > MaximumDistance || QualifierDistance > MaximumDistance ||
CallbackDistance > MaximumDistance)
return InvalidDistance;
unsigned ED =
CharDistance * CharDistanceWeight
QualifierDistance * QualifierDistanceWeight
CallbackDistance * CallbackDistanceWeight;
if (ED > MaximumDistance)
return InvalidDistance;
// Half the CharDistanceWeight is added to ED to simulate rounding since
// integer division truncates the value (i.e. round-to-nearest-int instead
// of round-to-zero).
return Normalized ? NormalizeEditDistance(ED) : ED;
}
used in clang/lib/Sema/SemaDecl.cpp
:
// Callback to only accept typo corrections that have a non-zero edit distance.
// Also only accept corrections that have the same parent decl.
class DifferentNameValidatorCCC final : public CorrectionCandidateCallback {
public:
DifferentNameValidatorCCC(ASTContext &Context, FunctionDecl *TypoFD,
CXXRecordDecl *Parent)
: Context(Context), OriginalFD(TypoFD),
ExpectedParent(Parent ? Parent->getCanonicalDecl() : nullptr) {}
bool ValidateCandidate(const TypoCorrection &candidate) override {
if (candidate.getEditDistance() == 0)
return false;
// ...
}
// ...
};
CodePudding user response:
I would recommend checking out this 10-year old blog by Chris Lattner for a general idea of Clang error recovery mechanisms.
On Clang's Spell Checker, he writes:
One of the more visible things that Clang includes is a spell checker (also on reddit). The spell checker kicks in when you use an identifier that Clang doesn't know: it checks against other close identifiers and suggests what you probably meant.
...
Clang uses the well known Levenshtein distance function to compute the best match out of the possible candidates.