BlackBox Code Coverage Fuzzing

In this blog, we will discuss about fuzzing, importance of code coverage in fuzzing and a basic level implementation of blackbox code coverage. But before we go any further, lets familiarize ourselves with the terminologies/jargons:

Fuzzing (https://en.wikipedia.org/wiki/Fuzzing):

“A developer (human) can never compute all possible input which can be fed to the program being developed, so he asserts some conditions and develops the program based on the assertions. Fuzzing helps in finding which inputs will fail the developer assertions and cause the application to behave differently.”

Fuzzing is a software testing method, where random invalid input data is continuously fed to the program which is monitored for crashes. This method is used to discover programming errors caused by developers, it helps in discovering security vulnerabilities (Memory leaks, UAFs, Type Confusions, OOBs read or writes, etc.) as well as UI bugs which can cause hangs in the program if an unexpected data is passed to the program.

Corpus:

Corpus is a set of inputs which acts as seed for mutating and generating random data which is then fed to the program being fuzzed.

Every testcase fed to the program to be fuzzed consumes CPU cycles, time and power. To minimize the cost of fuzzing and find more bugs in less time, our corpus should be optimized to execute 99% of all the instruction present in program. This can be achieved by implementing the concept of code coverage.

Code Coverage (https://en.wikipedia.org/wiki/Code_coverage):

Code coverage is a measure used to describe the degree to which the source code of a program is executed when a testcase runs. A program with high test coverage, measured as a percentage, has had more of its source code executed during testing which suggests it has a lower chance of containing undetected software bugs compared to a program with low test coverage.

In nutshell code coverage is the process of:

  • Finding areas of a program which is not generally reachable with common testcases
  • Creating additional/manual testcases to reach maximum code path
  • Determining a quantitative measure of code coverage, which is an indirect measure of quality of corpus
  • Minimize and remove redundant testcases that do not increase code coverage
Code coverage does not assure quality of your actual product but assures exceptional quality of test cases (corpus) to identify maximum security vulnerability in given product making it more robust and secure for end user.

The simplest scheme of code coverage is at the function level. Functional coverage measures which functions were called, and which functions were not called. While this is useful at a very broad level, it does not really provide the granularity necessary to measure meaningful coverage for a file parser. A better metric is “block” coverage. A basic block in coverage terms is a section of code that always executes in sequence with no jumps in or out.

The best way to implement code coverage during instrumentation is basic block level coverage measurement.

Basic Block:

A basic block can be defined as a group of straight line instruction which does not contain a jump in or out in the middle of the block.

In IDA pro all the rectangular blocks in the flowgraph are considered as basic blocks

IDA1

Fig: Basic Blocks in IDA Pro

Now that we are familiar with the jargons, a basic level implementation of code coverage instrumentation would be:
  1. Extract starting address of every basic block from the module
  2. Instrument the binary with debugger of your choice and put breakpoints on all the starting addresses of the basic blocks
  3. For every testcase fed to the binary record all the breakpoints hit and measure the code coverage
  4. After initial corpus is tested minimize the corpus based on the recorded code coverage information.
To help you start with the above implementation, we have written a small IDA-python script which will extract starting addresses of all the basic blocks in a module. We can use the information given by this script to instrument the binary for code coverage.

import idautils
import idc
import idaapi

def BasicBlocks(ea):
base = idc.MinEA()
func = idaapi.get_func(ea)
func_start = func.startEA
func_end = func.endEA
flow = idaapi.FlowChart(func)
basic_blocks = []
for block in flow:
start = block.startEA
end = block.endEA
if start >= func_start and end <= func_end:
basic_blocks.append(int(start - base))

return basic_blocks

def main():
working_dir = ""#Change accordingly
base = idc.MinEA()
end = idc.MaxEA()
functions = {func:idc.GetFunctionName(func) for func in idautils.Functions()}

#iterate over each function
data_blob = []
total_bb = 0
for key in functions:
bb = BasicBlocks(long(key))
t_data = [int(key - base),functions[key],len(bb),bb]
total_bb = total_bb + len(bb)
data_blob.append(t_data)

print "Total Number of Basic Blocks in Module %d" % total_bb
file_name = "bb_test.txt"
fh = open(working_dir+ file_name,"w+")
fh.write(format(data_blob))
fh.close()

if __name__ == '__main__':
main()

The above script outputs the result in following format:
[ [ start_address_of_function, function_name,number_of_basic_blocks,[bb_addressess,..,..,] , [] , [] , ….]

Following image shows the beautified output of the script:

Data

At Volon, we have developed a proprietary code coverage technology which provides maximum code coverage for any given application with minimal set of corpus data which makes our product #OpFuzz more powerful and one of the fastest fuzzing environment.

#OpFuzz includes minimal corpus data with maximum code coverage as well as it has provision to accept corpus data from the user for any additional proprietary functionalities within application to increase reach beyond unreachable code path with public corpus data.



Prevent Cyber Attacks with advance intelligence