Home > front end >  How to get all global variable addresses and size at runtime through llvm or clang
How to get all global variable addresses and size at runtime through llvm or clang

Time:11-16

I'm analyzing c/c projects for memory errors tracking (out-of-bounds read/write). I would like to create at runtime a list of all global variables addresses , i.e. their boundaries. Is there any workaround with LLVM (e.g. some llvm module pass) I can came up with, such that at runtime I'm able to locate all global variables and their corresponding size?

The desired outcomes is described in the following c pseudocode.

// Example of file.cc
int i;
int a[3] = {0, 1, 2};
char *s = "Simple string";

SOME_LIST_TYPE global_list;

void track_global_vars() {
  for (GLOBAL_VAR gv: GLOBAL_VAR gvs) {
    LIST_ITEM *li = (LIST_ITEM*) malloc(sizeof(LIST_ITEM));
    li->start = gv.getAddress();
    li->end   = li->start   gv.getSize();
    global_list.add(li);
  }
}

int main(int argc, char *argv[]) {
  track_global_vars();
  // AT this point I would like to have:
  // global_list -> [i_start, i_end] -> [a_start, a_end] -> [s_start, s_end] -> ...

  // normal program execution
  return 0;
}

Any suggestion or workarounds?

CodePudding user response:

The LLVM pass AddressSanitizer already detects out of bounds memory accesses, including globals and also stack and heap. You can pass -fsanitizer=address to clang to use it. It's even been ported to GCC under the same flag. You can combine it with UBSan, the undefined behaviour sanitizer, as -fsanitize=address,undefined to catch even more errors, again available on both clang and gcc.

If for some reason you don't want ASan and you want to proceed with building a system that reflects on the sizes and addresses of global variables, you could declare a global in C extern SOME_LIST_TYPE global_list; and have an LLVM pass that fills in the data. Given a llvm::Module *M you can scan all global variables with for (auto GV : M->globals()) { (doxygen) and you can build a constant "GEP" which steps over one element of the global's type to get the pointer to the end. See the GEP faq. As a tip with LLVM's API, note that most of these instructions, Add, Mul, GEP, exist in two forms, subclasses of llvm::Instruction and subclasses of llvm::ConstantExpr. You need the second form if you want to make it constant data you can initialize your array with.

Use auto *the_array_to_fill = M->getNamedGlobal("global_list"); to get your global_list as an llvm::GlobalVariable, then call the_array_to_fill->setInitializer(...) to set its data. You'll need to prepare the data in the type and layout that you want it in, maybe an array with a struct of two members (begin and end) or an array of all begins then all ends, whatever works for you. The LLVM tutorial covers how to create LLVM IR which you'll need to do to build up the types and (Constant!) values you initialize your global with.

You may also want to the_array_to_fill->setLinkage(llvm::GlobalValue::AppendingLinkage); so that you get all the globals from all translation units combined into one array when linking, instead of a "multiple definition" error or using weak linkage and only getting one of them discarding the rest.

  • Related