Clang Libtool

Technically, Clang receives an Abstract Syntax Tree (AST), build by Clang Parser (clang/Parse/*), not the input C/C++ code (although Parser is part of Clang code base). There is obviously a Lexer in between this process, but neither Lexer nor Parser is our focus in this tutorial. Clang is responsible to convert the AST to LLVM IR which is implemented in Clang CodeGen. Once source code is translated to LLVM IR, all instrumentation have to be in LLVM IR. It is certainly easy to instrument in fine-grain language like LLVM-IR then complicated top-level source code. But in certain cases (e.g. code formatter, code documentation etc.), it is expected to modify the source code (overwrite the source file).

Clang Libtool is an API to modify the AST while it is in buffer and write back it to the source file in input source language. So, a developer can insert/remove/modify any node in AST by the API functionality and all others will be handled by the libtool. This design gives the plugin developer flexibility to focus on specific purpose. Besides good understanding of C++ coding, a developer will require following knowledge:

  • AST tree structure, understands their nodes including their attributes
  • Basic code structure of Clang Libtool
  • AST matcher (this is optional but it gives more flexibility in tool development)

Note: Some may face various issues working on Clang Libtool. As an example: Clang Libtool only converts source code to AST but does not expand code with macros.

Writeup Explanation

In this write-up, we will write a Clang Libtool that will insert a different version of a target function in source code and replace all the call invokation of original function to the new function. Additionally, the newly defined function will print all the params value in it. Consider the following source code:

int doSum(int a, int b){
    int sum;
    sum = a + b;
    return sum;
int main(){
    return doSum(10, 20);

The above code is a simple C code which has a method doSum() that accepts two integers, sum them, store it in a local integer variable, and finally returns it as integer. The function can be called from multiple places with different arguments (e.g. here inside main()). Let’s consider, we want to print the params received by doSum() for every invocation and we do not want to modify the original function. So, we write a different version of the method just after the original:

int doSum(int a, int b){
    int sum;
    sum = a + b;
    return sum;
int Wrap_doSum(int a, int b){
    printf("a = %d\n", a);
    printf("b = %d\n", b);
    return doSum(a, b);
int main(){
    return Wrap_doSum(10, 20);

As we have mentioned earlier, the only user input will be the target function name, in this case doSum. But, the user may also input a prefix for the wrapper method.

To achieve this, we have to insert a new function with two printf() for the two params and also call the original target function and return its results back to the call-site. Besides that, it will also have to replace every call invocation of the target function with the newly created function. We will split this job into following tasks:

  • Identify the target function.
  • Identify the function params and their types.
  • Create a function (prefix + original name) just after the target function.
  • Create the function body with two printf() to print the params value.
  • Append it with call to target function with a return.
  • Identify every call to the target function and redirect them to the newly created function.

Abstract Syntax Tree

Let’s start with step 1: understanding the AST. We use the following command to dump the AST and redirect it to a file:

clang-check -ast-dump target_test.c --extra-arg="-fno-color-diagnostics" -- > ast.out

We use the clang-check to do a basic error check in the input source and the -ast-dump to dump the AST. We add an extra argument to discard any format styler in AST (in plain file they don’t work). The full file may look scary but the only part we want to look into is where the parser generates AST for doSum() and main().

|-FunctionDecl <../target_test.c:3:1, line:7:1> line:3:5 used doSum 'int (int, int)'
| |-ParmVarDecl <col:11, col:15> col:15 used a 'int'
| |-ParmVarDecl <col:18, col:22> col:22 used b 'int'
| `-CompoundStmt <col:24, line:7:1>
|   |-DeclStmt <line:4:5, col:12>
|   | `-VarDecl <col:5, col:9> col:9 used sum 'int'
|   |-BinaryOperator <line:5:5, col:15> 'int' '='
|   | |-DeclRefExpr <col:5> 'int' lvalue Var 0xc778510 'sum' 'int'
|   | `-BinaryOperator <col:11, col:15> 'int' '+'
|   |   |-ImplicitCastExpr <col:11> 'int' <LValueToRValue>
|   |   | `-DeclRefExpr <col:11> 'int' lvalue ParmVar 0xc778300 'a' 'int'
|   |   `-ImplicitCastExpr <col:15> 'int' <LValueToRValue>
|   |     `-DeclRefExpr <col:15> 'int' lvalue ParmVar 0xc778378 'b' 'int'
|   `-ReturnStmt <line:6:5, col:12>
|     `-ImplicitCastExpr <col:12> 'int' <LValueToRValue>
|       `-DeclRefExpr <col:12> 'int' lvalue Var 0xc778510 'sum' 'int'
`-FunctionDecl <line:8:1, line:10:1> line:8:5 main 'int ()'
  `-CompoundStmt <col:11, line:10:1>
    `-ReturnStmt <line:9:5, col:24>
      `-CallExpr <col:12, col:24> 'int'
        |-ImplicitCastExpr <col:12> 'int (*)(int, int)' <FunctionToPointerDecay>
        | `-DeclRefExpr <col:12> 'int (int, int)' Function 0xc778450 'doSum' 'int (int, int)'
        |-IntegerLiteral <col:18> 'int' 10
        `-IntegerLiteral <col:22> 'int' 20

AST is afterall a regular tree structure with nodes and every node is derived from its parent node. In this code, we can see there are two siblings node in top-level. They are both FunctionDecl for two functions in user source code. We can notice the name of the function with its signature in this format: func_name 'return_type (param_type, param_type)'. Next, we can see ParmVarDecl for every param where they also have the variable naming with respective type. Finally, a CompoundStmt starts which is the function body that ends up with a ReturnStmt. For doSum(), there are more than ReturnStmt under the CompoundStmt e.g. we can see a BinaryOperator for = operator in sum = a + b; followed by a DeclRefExpr indicates the lValue sum and another BinaryOperator for the + operation that is also followed by two other DeclRefExpr for variable a and b, each of them are derived from ImplicitCastExpr to explain that they have to be converted to rValue from the lValue. Another important node for this job is the CallExpr in main(). We can see it has two IntegerLiteral which defines the call’s two arguments.

Workspace Structure

It is a standard procedure to build the Clang Libtool from the Clang tools directory (i.e. llvm/tools/clang/tools/). So, we create a directory (clang-wrapper) for the tool and add the following line at the end of CMakeLists.txt (in llvm/tools/clang/tools/).


Inside our tool workspace, we will have another CMakeLists.txt which usually looks like following:

cmake_minimum_required(VERSION 2.8.8)


install(TARGETS clang-wrapper RUNTIME DESTINATION bin)

At the beginning, it ensures the cmake minimum version to build this tool. Then we give a project name (in this case, syssec-workshop). Next, include_directories is for the clang source code path. The set basic LLVM link support.

The add_clang_executable is important, we first give the executable name (or tool name) and than the list of dependent source code to compile. The target_link_libraries is also important which mentions what clang runtime support will be required for this tool (i.e. clang-wrapper). We will use clangTooling (libtool support), clangBasic (clang basic support for user input handling), and clangASTMatchers (for clang ASTMatcher api). install will set the path where to install the tool.

Next to the CMakeLLists.txt, we should add the C++ source code for the libtool (we will have one i.e. wrap-method.cpp). Once we have everything ready to compile, we can build the tool the same way we build the clang/llvm from its build directory. The libtool will be available in the build directory (i.e. build/bin/).


The workspace directory is available here: Copy this to your (llvm/tools/clang/tools/) and add the directory to the CMakeLists.txt file there. Build the tool with your Clang build. Use the tool in following way:

clang-wrapper -wrap=true -wrap-prefix=wrap -wrap-target=doSum target_test.c --

Basic Code Structure

The code structure of a Clang libtool can be divided into four parts. We definitely require a main function which will process command line user input and prepare the next phase. The next phase is responsible to preapare the Rewriter (metaphore a pen) for the Compiler processed AST buffer. In the third step, we will prepare the AST matcher (metaphore pattern recognization engine). Finally, we will write handler to use the Rewritter for matched AST.

Command Line Parser: Inside the main(), we process the command line options. We can do extra verfication on user input here. It also creates the ClangTool instance and runs the tool with a customized ASTFrontendAction. Libtool API use the factory design pattern to return the instance of the customized ASTFrontendAction.

SYSSECFrontEndAction: This class extends from ASTFrontendAction where we can override the CreateASTConsumer()and prepare the Rewriter for the AST buffer. Later, we subvert the control flow to our customized ASTConsumer. Besides that, we also override EndSourceFileAction() to inform the compiler to commit the change in buffer to source file at the end of process.

SYSSECASTConsumer: This class extends from ASTConsumer where we can finally have the ASTContext and define-use the MatchFinder. In its private member, we have the MatchFinder and two handlers. In the class constructor, we define the match pattern and set their respective callback handler. We override HandleTranslationUnit() to request the compiler to start the MatchFinder process.

Handlers: The handlers are automatically called when MatchFinder finds a match. We have two handler, first one to create wrap function immediate to target function with identical function signature, the later one redirect every call expression to original target function to the wrap function.

So, overall the code structure looks like:

class SYSSECWrapper : public MatchFinder::MatchCallback {
  SYSSECWrapper(Rewriter &Rewrite) : Rewrite(Rewrite) {}
  virtual void run(const MatchFinder::MatchResult &Result) {
    // action for the matched pattern
  Rewriter &Rewrite;
class SYSSECRedirect : public MatchFinder::MatchCallback {
  SYSSECRedirect(Rewriter &Rewrite) : Rewrite(Rewrite) {}
  virtual void run(const MatchFinder::MatchResult &Result) {
    // action for the matched pattern
  Rewriter &Rewrite;
class SYSSECASTConsumer : public ASTConsumer {
  SYSSECASTConsumer(Rewriter &R) : handleWrapper(R), handleRedirect(R) {
    // define MatchFinder pattern
  void HandleTranslationUnit(ASTContext &Context) override {

  SYSSECWrapper handleWrapper;
  SYSSECRedirect handleRedirect;
  MatchFinder Matcher;
class SYSSECFrontEndAction : public ASTFrontendAction {
  SYSSECFrontEndAction() {}
  void EndSourceFileAction() override {
  std::unique_ptr<ASTConsumer> CreateASTConsumer(CompilerInstance &CI,
                                                 StringRef file) override {
    TheRewriter.setSourceMgr(CI.getSourceManager(), CI.getLangOpts());
    return llvm::make_unique<SYSSECASTConsumer>(TheRewriter);
  Rewriter TheRewriter;
int main(int argc, const char **argv) {
  CommonOptionsParser op(argc, argv, SYSSEC_COMPILER_WORKSHOP);
  ClangTool Tool(op.getCompilations(), op.getSourcePathList());
  // process command line option

Notice, once we have defined Rewriter in SYSSECFrontEndAction, we always carry it to the upper steps of the code structure.

Command Line Option

At the beginning of the source code (global space), we have defined multiple cl fields for different command-line operations.

// creates a option category to show what functionality available in this tool
// try clang-wrapper -help
static llvm::cl::OptionCategory

// creates multiple options to feed user inputs
// -wrap takes true/false and required
static llvm::cl::opt<bool>
    wFlag("wrap", llvm::cl::desc("Do you want to wrap a function?"),
          llvm::cl::Required, llvm::cl::cat(SYSSEC_COMPILER_WORKSHOP));
// -wrap-prefix takes a string and optional
static llvm::cl::opt<std::string>
               llvm::cl::desc("Select the prefix of the wrapper."),
               llvm::cl::Optional, llvm::cl::cat(SYSSEC_COMPILER_WORKSHOP));
// -wrap-prefix takes a string and optional
static llvm::cl::opt<std::string>
    targetMethod("wrap-target", llvm::cl::desc("Name the function to wrap."),
                 llvm::cl::Optional, llvm::cl::cat(SYSSEC_COMPILER_WORKSHOP));

// it is possible to show enhance message about the tool
static llvm::cl::extrahelp MoreHelp("\nA Clang Libtool to create a wrapper for "
                                    "a function to show its input values\n");

Inside the main(), we can do further sanitization on user provided command line input.

  // we can do simple extra user input validation
  if (wFlag) {
    // we atleast need to know the target function name
    if (targetMethod.length()) {
      llvm::errs() << "The target wrap function: " << targetMethod << "\n";
      if (wrapPrefix.length()) {
        llvm::errs() << "Prefix (User): " << wrapPrefix << "\n";
      } else {
        // this is default prefix if user does not provide
        wrapPrefix = "syssec";
        llvm::errs() << "Prefix (Default): " << wrapPrefix << "\n";
    } else {
      llvm::errs() << "Please, input a target function name.\n";
      return 0;

As an independent compiler tool, user interaction is important and libtool gives enough flexibility to handle them.

AST Matcher

We have AST matcher to match two different patterns: 1) function declaration of target function, 2) call expression with target function as callee.

To match following AST node:

|-FunctionDecl 0xc778450 <../target_test.c:3:1, line:7:1> line:3:5 used doSum 'int (int, int)'

We have following straightforward matcher:

    // we ask AST to match the function declaration with the target function
    // name, if so, callback the hangleWrapper

The .bind(tag) basically tag the findings so that we can use the tag later to retrieve the instance of matched pattern.

To match the following AST:

-CallExpr 0xc778860 <col:12, col:24> 'int'
        |-ImplicitCastExpr 0xc778848 <col:12> 'int (*)(int, int)' <FunctionToPointerDecay>
        | `-DeclRefExpr 0xc7787b8 <col:12> 'int (int, int)' Function 0xc778450 'doSum' 'int (int, int)'

We write the following matcher:

// we ask AST to match the call expression with target function as callee
    // and if so, callback the handleRedirect

This may seem confusing because there is no function declaration under call expression. Basically, callExpr(callee)returns the DeclRefExpr which we have fine-grained by functionDecl as callee must be a functionDecl (it could be cxxMethodDecl for C++ code).

Details about AST matcher is available here:


This is where we start writing the action for matched pattern (with the help of the Rewriter). We have two different handlers for two different matcher. Both of them extends the MatchFinder::MatchCallback. The override run() receives const MatchFinder::MatchResult & from which we can extract the findings using the binding tag. We can also have the ASTContext to access the SourceManager.

SYSSECWrapper: We bind the FunctionDecl with tag wrapFunc which we access as following:

const FunctionDecl *func =

Once we have the FunctionDecl instance, we get access to every information related to the function. As an example: we can access the return type and number of params of the function:

string retType = func->getReturnType().getAsString();
unsigned int paramNum = func->getNumParams();

In a similar fashion, we can access ParamVarDecl, function body Expr etc. For any Expr, we can access their source location using getBeginLoc() or getEndLoc() (there are more functions available to access different locations related to an expression).

      // the target function end point is the '}', so we ask for +1 offset
      SourceLocation TARGET_END = func->getEndLoc().getLocWithOffset(1);
      std::stringstream wrapFunction;
      string wrapFunctionName = wrapPrefix + "_" + targetMethod;
      // we create the entire wrap function text
      wrapFunction << "\n" + retType + " " + wrapFunctionName + +"(" +
                          funcParamSignature + ")\n{\n" + funcBody + "\n}";
      // let's insert the wrap function at the end of target function
      Rewrite.InsertText(TARGET_END, wrapFunction.str(), true, true);

Finally, we like to insert a function at the end of the target function. So, once our function body and signature is read (wrapFunction), we use the Rewrite.InsertText(). We first mention the location where to insert and then the text. The third param of this method asks if we want to insert after the location or before, the fourth param is for indentation to new line.

SYSSECRedirect: We bind the CallExpr in the matcher with tag callMatched and we access it here. Once we have it, we only need to replace the callee with the wrap function. So, we use Rewrite.ReplaceText() where we first mention the source range of the callee that we want to modify and then we mention the new callee name.

const CallExpr *cexpr =
Rewrite.ReplaceText(cexpr->getCallee()->getSourceRange(), wrapFunctionName);

That’s All

Yes, thats all you require to know to write your own Clang libtool. Just follow the code structure and write your code using all the functionality available from Clang. Do not try to break the structure, it will help you to avoid any issue.

Research Review 2018

I will not introduce myself as an expert of security research. Actually, I will never like to say myself as an expert (maybe at most professional) on any field. Every research field is expanding every year, little or more (does not matter), it is not possible to say anyone to expert on that field. I consider myself as an advanced software and security researcher, so a professional on this field would be an inspire to me (with his/her dedication and deep thoughts for the field). The year 2018 is a disappointment to me. People who could be great inspiration for newbie like me (I was before 2018) have completely failed to show their impartial view and encouragement to move forward the field. Hence, we are still asking same question in this 2019 that why we cannot prevent decades old security issues yet. We have failed to resolve old issues when we are continuously facing advanced issues (e.g. online fake news propagation because of latest machine learning technology). This is very frustrating but at the same time it should be also encouragement to newbie and advanced level researcher to row against the tide (the negative view). I hope this new generation of security researcher will realize how it is important to resolve the issue in practical than daydreaming over theoretical talks. I hope it because this generation sees how badly the negligence of professional research impact today. Our everyday starts with a doubt about whatever we see, listen and read in online. A current trend is moving to cut off people from the virtual world. This is equally dangerous for human race and a disgrace for us. We cannot let it happen, people have to be connected the way they are now but without any fear of being exiled. I can ensure that our professionals may change their views, but they cannot contribute; so it is all upon us. I like to follow up few points from my observation:

  • The technology moving a lot faster than ensuring its stability. We are aiming to travel space while we have not yet ensured driver-less vehicle system. We should not slow down our evolution, but more people have to be focused on ensuring the stability of the system. Every year, maybe 5% of graduate student choose to research on security research. It is not because the field does not have fund but because the prospective people see it as a hard job than other research. I don’t know how to relate more people in this field because their consideration is fair, it is a challenging field to work. Maybe other fields have to impose strictness on their research. As an example, machine learning research should have to be passed 90% security guarantee in practical. This way it will both reduce the pressure on security and equal the challenge bar for every field.
  • Traditionally, security researchers have to consider both performance and scalability of their system to implement in practice. I believe this is not the right order to follow. First of all, the implementation has to be 100% correct. The implementation can have 90% security guarantee due to the complex system that they will work on. By this, I mean, a system can have limitation but the part is cover should be 100% correct. It is always possible to overcome limitations but when a system grows up, it is not anymore possible to correct it. Performance should be the third factor to consider. I acknowledge that if a security feature causes 100% performance overhead, it is useless. But if it causes 50% overhead, after 5 years this will be only 20% because of the advancement of computation machinery. So, let accept the system and move forward with it. Only by this strategy, we may avoid questions like why decade-old issues are still around us. Overall, it is correctness, scalability, and then performance.
  • Change the view is the most urgent. We know static analysis is overapproximated means, not 100% correct. It is meaningless to stick to it. Let use it as a secondary tool, focus on more precise dynamic and symbolic analysis. These tools need to be improved considering they are the ultimate future to resolve our issues in real.
  • Usually, in research, we target an issue and theoretically designed a system to prevent it. When we get into deep with the initial design, we have found more complicated design problems. Hence, sometimes, we design a subsystem that actually creates another issue, stupidly even the issue we are trying to resolve. If you try to provide a security guarantee to indirect jumps and have a design in your minds but have found a technical design problem, don’t design a sub-system that creates more indirect jumps when you cannot ensure 100% code coverage. This is complete stupidity. If you cannot see a way out, kill your idea and move forward with another.
  • In security research, we all see two division: academic and hacker. This division is extremely visible (they even attend the different conference of their own). Both sides of researchers are responsible for this situation. This tradition needs to be changed. The blame game is not good for humanity. Both sides have to fix their shortcomings and move forward together. The academic researcher should be impartial whatever their representative institution is. Hacker have to be passionate because a system can only suvive through years of dedication.

Have a best of 2019 for security researchers.

My Reading: Software Defense

HexType: Efficient Detection of Type Confusion Errors for C++

Source Code:

Summary: In a type-based programming language, typecasting is a common phenomenon. With the object-oriented programming paradigm, this feature turns into a dangerous attack surface. When a derived class object cast to parent class object (upcast), it is usually safe, considering parent class is a subset of the derived class. On the other hand, casting a parent class object to the derived class pointer (downcast) can raise a serious issue (if the derived class has an extra item than parent class). Moreover, these casting as often indirect (an object pass to a function as a class pointer type). The research work attempts to resolve the issue with runtime checking (similar checking as dynamic casting, but expensive and limited to classes with virtual function). The research expands the coverage with a more efficient runtime checking with their defined data structure.

Design: The design can be divided into three parts. 1) Type Relationship Information: This information is available to the compiler at compile time to verify static cast, but they have to instrument it to the binary to use it at runtime. 2) Object Type Tracing: Whenever an object is allocated, a table will update the object information with its original class type. To do that they instrument every allocation site. The table they use is a combination of a hashmap with each hash item represents an RB-tree. The hashmap check considers as a fast-path check while RB-tree traverse considers as slow-path check. The hashmap will frequently update to recent object verification. 3) Type Casting Verification: In every type casting location, instrumentation will ask for runtime verification of that casting object. The verification will collect runtime information and verify against (1)[if casting path allowed according to the tree] and (2)[original type of the object is casting of]. They expand this verification coverage to dynamic cast too by replacing it to theirs. However, they also optimize the performance by discarding safe object from being tracked and safe casting from being asked to verify (detect the safe object and casting through static analysis).

Advantage: The design is simple to be scalable. They also detect couples of type confusion bugs using this. Performance overhead is better than CaVer but worst than TypeSan. They also achieve better code coverage.

Disadvantage: Static analysis based optimization is not described completely. Traditional analyses are not reliable. Handling reinterpret cast is also dubious.

Discussion: Overall, type confusion is an interesting issue to cover. Detection through runtime verification is hopeful. C also could have a similar bug, they should all be covered.

My Reading: Memory Defense

Stack Bounds Protection with Low Fat Pointers

Source Code:

Summary: The research work is an extension of their another work (Heap bounds protection with low-fat pointers). The concept of low-fat pointers are originated in the previous paper, they provide well-details of that too. The basic concept of low-fat pointers is: use the pointer memory itself to calculate its object boundary instead of extending the memory or creating a shadow memory. Additionally split up the memory region into different virtual pages based on object size for performance improvement. However, handling heap memory explicitly is easier than the program maintained stack memory for which they propose a stack allocator which will handle all stack memory operation.

Design: According to the design, when a pointer loads the object information, it has to access a lookup table and use its pointer address divided by 32GB (size of the virtual page) as the lookup to figure out its object size. Next, using the roundup technique with pointer address and object size, it can calculate the base address of object memory. Now, as it has both base address and objects size, it can validate the object boundary, every time the pointer access a different location on that object. The bound check operation will operate over instrumented binary. For heap allocation (e.g. malloc), they can modify the allocator to allocate the object into a variety virtual page. But for stack objects, each of them allocates to the same virtual page dedicated to stack. Due to this, their physical memory is non-fragmented, this improves their memory management. For stack objects, it is important to handle memory efficiently. To cope stack objects to the same design for heap and the same time, don’t hurt the regular stack operation, is critical. Moreover, it is important to be efficient. To calculate lookup table, they use lzcnt instruction, for aligning, they do a masking operation (another lookup table), and they allocate the objects in splited 32GB virtual pages similar to the heap, but now separate each 32GB page for heap and stack. Everything is fine unless the physical memory is now more fragmented due to redesigned stack allocator. So, they propose a memory alias to map every virtual stack page to the same physical memory. They do apply different optimization where to check bounds and where not, but they are not a major contribution.

Advantage: Low fat pointer is memory efficient, but when it comes to performance, the design still has to sacrifice little more memory than it has to.

Disadvantage: First of all, now it is easier to predict where an object is located in the physical memory by only knowing the object size. Although pointer integrity is not a part of the bound check, now it can cause greater disaster if not checked. Memory aliasing design for virtual stack pages are not reliable, the design is obscure.

Discussion: The design and evaluation do not show any hope to the practical implementation of low-fat pointers for stack object. Overall, I realize, low-fat pointers cannot precede shadow memory design.

System and Software Security Research Conference

If your research is system and software security, nothing cannot be worst than counting how many different conference you should follow. Due to nature of work, you surely have to follow security, system and software engineering conference. Besides them as security research is very much practical, you also have to follow hacker conference. Sometimes for literature review, you may also have to follow 2nd tier conference (but they are not necessary). Here I list what I follow time to time:

Top-tier Security Conference

* IEEE Symposium on Security and Privacy (S&P/Oakland)
* Computer and Communications Security (CCS) 
* USENIX Security Symposium (USENIX Security)
* Network and Distributed System Security Symposium (NDSS)

Top-tier Software Engineering Conference

* International Conference on Software Engineering (ICSE)
* ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS)
* Programming Language Design and Implementation (PLDI)

Top-tier System Conference

* ACM Symposium on Operating Systems Principles (SOSP)
* USENIX Annual Technical Conference (USENIX ATC)
* European Conference on Computer Systems (EuroSys)
* USENIX Symposium on Operating Systems Design and Implementation (OSDI)

Hacker Conference

* BlackHat
* DefCon

2nd-tier Security Conference

* International Symposium on Research in Attacks, Intrusions and Defenses (RAID)
* European Symposium on Research in Computer Security (ESORICS)
* IEEE European Symposium on Security and Privacy (EuroS&P)

My Reading: Exploit Generation

It is impossible to write a complicated system without any hidden vulnerability resides in it. When researchers are continuously working on developing tools to detect these hidden vulnerabilities, the attackers are also coming up with new kinds of vulnerability. This race seems to be never-ending, so researchers also concentrate to design a defense system that can at least able reduce the impact of a vulnerability. These defense systems restrict an attacker to develop an effective exploit in the vulnerable program. This opens another door for attack researchers to develop tools to generate exploit automatically.

Block Oriented Programming: Automating Data-Only Attacks

Source Code:

Summary: Vulnerable software with an active defense system (e.g. Control-Flow Integrity, Shadow Stack, Address Space Randomization etc.) is hard to exploit. Control Flow Integrity (CFI) restrict execution within valid control flows, although because of the weak control flow graph (CFG), the coarse-grained CFI system allows overapproximating control transfers. This keeps open the exploit door for an attacker to lead the execution to a certain status (e.g. pwn the shell, information leak etc.). In this research work, the author proposes a high-level language to write a tuning complete exploit, a new approach to generate exploit using basic blocks as gadgets and validate their control flow following valid path in CFG. This approach can beat shadow stack which ROP gadgets cannot do.

Design: When a vulnerable system is built with CFI and shadow stack, it is neither possible to target any arbitrary location in the program nor possible to return anywhere. The major contribution here is that instead of using ROP gadgets, using basic blocks as gadgets, an attacker can let the program follow the valid control transfers when still he can achieve his goal. But before that, an attacker has to express his desire what he is hoping for to exploit. So, they propose a language for writing exploit payloads (SPL) which is architecture independent and tuning complete. An attacker can use abstract registers as volatile vars, a subset of C library calls (e.g. execve etc.). SPL can contain multiple statements and for each of them, the block oriented programming (BOP) will find out a set of basic blocks that satisfy that statement. For example, if a statement expects a register to hold value 7, it will find out basic blocks in the vulnerable program that ends up with any register containing value 7. These set of basic blocks are identified as candidate blocks. Next, the system will mix the set of basic blocks each represents one statement without one basic block clobbering another basic blocks purpose. They are called functional blocks and now it is time to identify their dispatcher blocks. Dispatcher blocks are those blocks which are not clobbered blocks and connect one functional blocks to another. Searching for dispatcher blocks are NP-hard problems, even approximation algorithm will not work (because of large code space in real-world programs). So, they propose a backtracking system with a minimum threshold for search depth. This may overlook lots of possible dispatcher path, but it is important for scalability. With functional blocks with their dispatcher blocks, they now have delta graph which is a context-sensitive shortest path to lead a vulnerable entry to exploit zone. Although the exploit is valid in presence of CFI and shadow stack, it is important to validate them by solving their constraints in different control transfers. A concolic execution system is used to prove a subgraph of the CFG that can use for the exploit is possible to execute. Now, if an attacker has found a vulnerability in a system, he can use this exploit generation system to figure out what exploit could be useful.

Advantage: Their exploit generation technique is a forward approach that’s why it is easy to follow. The approach of block-oriented gadgets is a real expectation (ROP gadgets are an insane idea for the secured system).

Disadvantage: Too many thresholds limit the power. The SPL idea is not necessary, it is still a C code. The evaluation shows the system mostly fails to generate exploits for a major breakthrough like pwn the shell.

Discussion: The concept of using basic blocks and follow the control flow graph as the base to help find out a subgraph which is also valid is a good one. But, due to heuristics, the expectation limits a lot. A forward approach also limits what else a vulnerable system can be able to deliver.

SVF: Interprocedural Static Value-Flow Analysis in LLVM

SVF is a static analysis framework implemented in LLVM that allows value-flow construction and pointer analysis to be performed in an iterative manner (sparse analysis – analysis conducted into stages, from overapproximate analysis to precise, expensive analysis). It uses (default) points-to information from Andersen’s analysis and constructs an interprocedural memory SSA (Static-Single Assignment) form where def-use chains of both top-level and address-taken variables are included. From the design perspective, the SVF can be seen as two modules: pointer analysis and memory SSA. To be capable of allowing another points-to analysis than Andersen’s, the design of pointer analysis is split into three parts: Graph, Rules, and Solver. To give control of scalability and precision, the memory SSA is designed to allow users to partition memory into a set of regions. Comparing to previous work, SVF takes a big leap by constructing an interprocedural value-flow graph. To identify bugs, it is an important requirement to be able to analyze a program across its procedural boundaries.

The latest version of SVF is implemented on LLVM-7.0. SVF takes LLVM IR code (a bitcode file) as input (LLVM IR module) and can analysis both C/C++ programs. To use SVF, the source needs to be compiled with Clang to generate the bitcode (clang -emit-llvm source.c -o source.bc) and later links all (.bc) file into a large single (.bc) file through LLVM Gold plugin (llvm-link source1.bc source2.bc -o source.bc).

As mentioned before, the pointer analysis consists of three loosely coupled components: Graph, Rules, and Solver. The graph is a higher level abstraction, shows where a pointer analysis should take place. A rule defines how to derive the points-to information from a statement in the graph. The ‘rules’ is the location where anyone can implement a different set of points-to analysis mechanism than Andersen’s. As the pointer-analysis will take place interprocedural, it is an important requirement to design a completely separate part to indicate which order should the points-to analysis work on the graph, i.e. the Solver. As an example, for Andersen’s points-to analysis, it uses transitive closure rules and a solver named wave. Similarly, a flow-sensitive pointer analysis uses a set of flow-sensitive strong/weal update rules with a points-to propagation solver on a sparse value-flow graph.

There are four steps in the value-flow graph construction. First, it performs a ‘Mod-Ref Analysis’ to capture interprocedural reference and modification side effect for each variable (a set of indirect defs). The ‘Mem Region Partitioning’ separates the memory partition into regions to let the user have the capability of deciding scalability and precision trade-offs. The ‘Memory SSA’ take place once the indirect uses and defs are known. It follows a standard SSA conversion algorithm to structure the program in memory SSA form. Finally, the ‘VFG Construction’ module links the uses of a variable to its defs and construct the value-flow graph.

The following statements are important to construct a complete value-flow graph for address-taken variables:

  1. ADDROF is known as an allocation site (a stack object [alloca instruction] or a global object [an allocation site or a program function name] or a dynamically created object at its allocation site). Note: For flow-sensitive pointer analysis, the initializations for global objects take place at the entry of main().
  2. Copy denotes either a casting instruction or a value assignment from a PHI instruction in LLVM.
  3. PHI is introduced at a confluence point in the CFG to select the value of a variable from different control-flow branches.
  4. LOAD is a memory accessing instruction that reads a value from an address-taken object.
  5. STORE is a memory accessing instruction that writes a value into an address-taken object.
  6. CALL denotes a call instruction, where the target can be either a global variable (for a direct call) or a stack virtual register (for an indirect call). In a cell site, a return value or in the entry of a procedure, the parameter could be defined as direct or indirect.

One featured client of SVF is source-sink analysis (detecting security bugs). An example bug detector is a memory leak detector where it is important to find out whether a memory allocation (source) will reach a free site (sink) or not. SVF’s value-flow graph could be used by a source-sink analysis client to achieve it. The other client could be a selective instruction client which will allow dynamic analysis tool to look only at suspicious sites. It could also be used for developing a debug tool to guide the developer.

Besides their 40K lines of code for SVF, the group also open-sourced a pointer analysis mirco benchmark, PTABen.

My Reading: Software Attack

k-hunt: Pinpointing Insecure Cryptographic Keys from Execution Traces

Source Code:

Summary: It would be useful for attackers if they can identify the memory location where an application store its cryptographic keys. It will be more useful to do taint analysis for various purpose (e.g. identify if a key is insecure). This research uses an online dynamic verification system to identify the cryptographic key memory location and verify if they are secure.

Design: The key challenges to identify a cryptographic key are: 1) how to identify cryptographic operations without the signature (because they are intended for proprietary algorithm too), 2) how to identify the memory buffer of the cryptographic key. Additionally, to detect an insecure key in a complex program, it is important to design an optimized taint mechanism. For each of them, they present their key insights. For case 1, they use three different heuristics: identify basic blocks which involves the arithmetic operation, repeatedly executed, and their input-output relationship contain randomness. For case 2, they use two additional heuristics: usually, the key size is less than data buffer size and key are usually derived just before cryptographic execution with randomness. When the key is identified, in the next phase, they rerun the dynamic analysis to detect the insecure keys. To tag a key insecure, it should be: 1) generate with inadequate randomness, 2) in the remote-client environment, only one of them can participate in key derivation, or 3) key is not cleaned up immediately after the use.

Advantage: The design is simple and modularized. It only depends on one specific technology, dynamic binary analysis. The evaluation is influential. The cause is effective.

Disadvantage: The overhead is high for online verification. There is still a possibility of communication cut. The rerun for insecure key detection could be failed because of ASLR.

Discussion: Overall, the research has a significant contribution. More insecure key cases could be derived in future and implemented over it.

My Reading: Control Flow Integrity

Due to memory vulnerability in popular programming (C, C++), the hacker can use them to overwrite other memories and take control of the vulnerable system. To the attacker, control flow hijack is one of the first priority to check on if they can find a vulnerability. Control flow hijack stands for turning regular program execution path to the attacker’s desired target. As example, an attacker can jump into the shell by targeting system function call setting with appropriate arguments in memory. If the vulnerable program has root access, the attacker can so too. Control flow integrity keeps an active eye on program execution time so that the program cannot be departed from its expected path. Two things are most important in CFI. 1) A complete control flow graph. 2) A strong enforcement policy. If either of them lose a little bit of accuracy, it will harm the another. Both the features are important for each other, so it is expected that every CFI system properly explain them.

Source Code:

Summary: The project has tried to achieve an ambitious goal: based on their execution history, enforce a CFI policy that will allow only one valid target for an indirect jump/call. For decades, researchers have tried to design a strict enforcement, a strong CFI policy. But the performance overhead and complex real-world program fail them every time. This project is also no exception to them. They have failed to evaluate spec benchmark like gcc, xalancbmk, omnetpp etc. which are the most important spec benchmark to evaluate considering their complexity with implementing CFI defense (they also have the most number of indirect calls). The research hugely depends on Intel PT hardware support which usually fails to capture complete information if they have frequently encountered. Basically, Intel PT is designed for offline debug purpose, so there is no chance to fix this issue for an online analysis.

Design: The project can be divided into three parts: 1) constrain data identification 2) arbitrary data collection and 3) online analysis. Constrain data is a program variable which is not a function pointer but has a significant impact to fix where a function pointer will target. For example: if there is an array of function pointer, and someone uses variable i as the index for that array to call an indirect function; then i is going to be a constraint data. They use compile-time analysis to identify first instructions related to function pointer calculation, second, they check the operands of those instructions if they are variable. Once they know constraint data, they have to collect the data at runtime (2). They use compile-time instrumentation for this purpose. Now, as they are planning to use Intel PT to track execution history, so they have to figure out a way how to force the instrumention to entry the arbitrary value in Intel PT trace. Hence, they come out with a trick: they call their defined write function with constraining data value as the parameter and from there, they create multiple indirect function call with return only body where the target will be fixed by a chunk of the passed value. To reduce the number of packet trace, they also do the same for the sensitive basic block where they also assign a unique id for each of them. With all of these, they can now get rid of checking TNT packets (for conditional control transfer) while only checking TIP packets (indirect control transfer). For online analysis(3), they decide to it in separate processor parallel to the original program. They have defined a read method to reconstruct the arbitrary data from Intel PT packet trace and validate it with a mapping.

Advantage: If you can perfectly identify every constraint data for every sensitive indirect control transfer, then if you have a perfectly working hardware to trace the history, then if the separate validation does not harm your security guarantee, then it is a very good system.

Disadvantage: Unfortunately, not any of them is possible to work perfectly. The static analysis with the complex real-world application will never be successful with such a simple algorithm. There will be always special cases and you have to keep updating your analysis mechanism time by time. Next, a missing perfect hardware is eventually reported by themselves. And everyone knows, you need to know before you jump. Pause your program in sensitive system call-points can easily cause a DoS attack by creating a real long pending list of validation check through frequently created indirect control flow transfer. About performance overhead claim, they have mentioned 10% average performance overhead. But, for perlbench and h264ref they report 47% and 24% performance overhead when they have not evaluated similar benchmark gcc, omnetpp, and xalancbmk. If we even consider 25% overhead for each of them, the overall overhead will go up at least 15%. They have evaluated nginx and vsftpd but they have not mentioned what test suites they have used. Moreover, their system is creating more indirect call, maybe even more than that exists in the target program.

Discussion: To emphasize their security improvement, they have mentioned a case study from h264ref spec benchmark. They show there is a structure that has a member function pointer. This structure is later used to create an array of size 1200. With that, they simply assume, there are 1200 valid targets for that member function pointer (unique one for each of them) and they have reduced it to 1. But the truth is that member function has only taken three valid functions in the entire source code. The project is too ambitious and hence not practical.

PT-CFI: Transparent Backward-Edge Control Flow Violation Detection Using Intel Processor Trace

Source Code: N/A

Summary: Intel PT is an Intel hardware support for offline debugging. It can capture compressed data packets for indirect control flow transfer, conditional branch taken/not taken etc. PT-CFI attempts to use the hardware feature for the backward edges through enforcing a shadow style protection. It leaves forward indirect control transfer out of its scope due to the unavailability of complete point-to analysis to generate the required CFI policy. It also limits its verification to critical syscall to undo the performance overhead.

Design: PT-CFI has four components: 1) packet parser will generate the TIP sequence and fed to 2) TIP graph matching (build upon training). If not match, invokes 3) Deep inspection to decode and construct shadow stack. If a match with shadow stack, add the new TIP sequence to TIP graph; otherwise 4) the syscall hooker will be informed to terminate.

Advantage: The paper describe the basic knowledge on Intel PT well.

Disadvantage: The paper fails to clarify crucial parts. For example, in deep inspection, they construct a shadow stack based on Intel PT traces and claim to match with shadow stack (what shadow stack?). They leave forward indirect control transfer out of consideration. The performance overhead is still very high (e.g. gcc spec with 67%).

Discussion: Intel PT is introduced for offline analysis, using it for online validation is not overall a good idea. The CFI policy generation with training concept is not well described. Mostly, the deep inspection is hugely misleading.

GRIFFIN: Guarding Control Flows Using Intel Processor Trace

Source Code:

Summary: The author only attempt to prove the performance overhead optimization using Intel PT for online verification. They claim to verify the enforcement policy for both backward and forward indirect control transfer with different strictness of policy when they completely discard the discussion regarding how they achieve these policies to verify with.

Design: To efficiently extract the information from Intel PT trace log, they propose a fast lookup design. The basic block to derefered pointer lookup is basically a mechanism similar to a cache. They hardly describe anything related their Coarse-grained and Fine-grained policy, but as they only trace TIP packet in Intel PT trace, they at-most can use indirect control transfer information based policy. They simply omit enough discussion on stateful policy for forward control transfer by mentioning they use a similar approach to PathArmor.

Advantage: They briefed about advanced topics like how to handle context switch, process fork etc. when they omit much needed basic discussion.

Disadvantage: The writing is a complete hide and seek game. It misses much of the basic discussion. It completely discards anything about how they get the policies. However, they put enforcement in a parallel process and enforce strict restriction in sensitive syscall entry. This overall concise the whole idea of control flow integrity. Hence, it is not meaningful how the performance overhead they achieve.

Discussion: An offline debugging hardware should not be considered for online verification, it will be just a waste of time.

Transparent and Efficient CFI Enforcement with Intel Processor Trace

Source Code: N/A

Summary: The project is aiming to protect indirect control transfer through coarse-grained indirect control flow graph using Intel PT only at security sensitive system call point. The system uses a fast and slow check hybrid method to achieve efficiency. The fast check doesn’t require to decode the trace and only available if the dynamically fuzzing based training CFG have the exact trace. Otherwise, a slow path has to decode the trace and verify it against a static analysis based CFG (TypeArmor).

Design: The design consists of four tasks. First of all, there is a static analysis based CFG. Then, they will rank the CFG edges based dynamic fuzzing training CFG. There will be a system call interceptor in the kernel to warn security sensitive system call. The flow checker will collect trace and based on its trace, will match with the fast path or slow path. The CFG edges are only from indirect control transfer as the runtime will only use Intel PT TIP packets.

Advantage: The performance overhead is acceptable, but the security guarantee is very low. The paper tries to describe every step.

Disadvantage: The design is very concise. First of all, they only monitor security sensitive indirect control flow transfer. The fast path only available for edges with fuzzy CFG (depends on code coverage). The slow path uses static CFG which is overapproximate. The CFG itself only consists of indirect control transfer edges as Intel PT only use with TIP packets.

Discussion: This is the 3rd paper who also tries to use an offline debug purpose hardware, Intel PT, to use for online verification. Their limiting monitor on security sensitive system call once again prove that this hardware is not good choice for this purpose.